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ABSTRACT 

the Phase IIA Final Report on the Center for 
ices project at UCLA ascribes early design 
the Institute of Library Research* Both 
and computer aspects of information services 
are covered. The Appendices present various 
and an inventory of programs written or 
staff during this Phase. 
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wore ON REPORT FORMAT 



The »ain body of this report was keyboarded using a 
video console, corrected and formatted for publication by 
computer using an XLR-developed program (PMS) , and printed 
directly in the format shown. 




I. INTRODUCTION AND BACKGROUND 



Sinsa 1966, the Institute of Library Research at UCLA 
has been engaged in a large-scale project centering upon 
computerized information stores ("data bases") and the 
prospects for their use in a university environment. The 
first phase of this investigation, a feasibility study, was 
completed and reported to the National Science Foundation in 
December 1967*. since the present report is a direct 
continuation of that work, it is appropriate to discuss it 
briefly here. 

The feasibility study of "Mechanized Information 
Services in the University Library" began with the postulate 
that computerized data bases were becoming widely available, 
that they represent a major new information resource that 
the university cannot ignore, and that, therefore, the 
university must begin to make plans to deal with data bases 
and their impact on teaching and research. A structure was 
needed, an organizational and technical framework within 
which computer pr oces s ible dat a stores could be integrated 
into the existing information network of the university. 

The feasibility study suggested a rational method for 
the university to accomplish the enormous and complex task 
of providing its community with information services from 
computerized data bases. This method entailed two innovative 
hypotheses. 

3n the technical side, there had to be found some way 
of avoiding the projected consequences of the programming 
situation as it was in 1966-7, in which each data base had 
its own Individually tailored program or set of programs to 
gain access to the data. Extrapolated over only one decade 
or so, the idea of a university trying to service even a 
modest selection of 15—20 data bases, each with its own 
programming system, written in any one of many possible 
programming languages and designed to operate on any one of 
several possible types of computer, raised so many 

obstacles-- technical, economic, service-oriented and 

administrative— as to make it clearly untenable as a 

long-term answer. Accordingly, a programming strategy was 

* aayes7^*~R» M. Mechanized inf or mation Serv i ce s in the 

SniilCStiZ BSSSaESh Library. Phas e I, Fi nal Repo rt 

(pacts 1-13) .Univecaity of California, Institute of 

Library Research, Los Angelas. December 1967. 
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proposed which would enable the host systai to deal with any 
input data file on its own terms. For this, three 
subsystems were envisaged, corresponding to the three 
predominant types of data bases identified by the 
study — reference, full-text and numerical. The programs 
which comprised each of these subsystems would be task 
oriented rather than file oriented, permitting a g i van 
process to be executed upon any data base of the appropriate 
type. The university might thus sidestep the huge problem 
of acquiring and trying to operate many disparate program 
packages, and concentrate on acquisition of the data bases. 



On the administrative side, an agency with the 

capability of processing whatever data bases it wished and 
of offering the needed range of services from them, did not 
exist. Since the proposed activities combined some of the 
functions of the library with some of the functions of the 
computing center there were three alt ernat ives: locate the 

proposed agency in the library and make whatever 
arrangements were necessary to assure computer hardware and 
software support* locate the agency in the computing center 
an! make the necessary arrangements for library and 
bibliographic support; or establish an entirely new 

organizational framework for computerized information 
services. Bearing in mind that such an agency would not be 
an information Center, but rather a Center for Information 
Services, and that there was more evidence that libraries 
ware beginning to harness computer technology than there was 
evidence that computing centers desired to take on the full 
panoply of library services, the suggested location was the 
library, fThe very term ''Canter for Information Services" 
defines, in a fundamental sense, any library's enduring 
mission.) To create a new agency would incur large 

additional costs without any assurance of success and with 
the danger of needlessly duplicating part of the activities 
of both the library and the computing center. 

During Phase IIA of the "Center for Information 
Services** pro gee t at UCLA (N3P Grant GN— 827) the Institute 

of Library Research has continued to explore the areas 
defined in the Final Report on Phase I, namely the 
technical, bibliographic and administrative problems 
involved in creating a uni versi ty— library— based capability 
for the provision of a broad range of information services 
fro* a broad range of data bases. In reporting the 
experimental activities of ILR during this period, this part 
of the Phase IIA Final Report deals with data base handling 
from two points of view - — Library. Aspects, and C om puter 
Aspects. It should be recorded that:- the section on Co mputer 
Aspects relates only to the first half of the Phase IIA time 
period, after which several ILR technical staff members were 
transferred to the Campus Computing Network, where a pro- 
gramming taam was being formed. Subsequent technical work 
this team is presented in Part 3 of this Final Report. 



IX. LIBRARY ASPECTS 



INTRODUCTION 

The Inventory of ®&SES^ part 3 of the 
Final Report on Phase I of this project* has turned out to 
ba ona of the most consistently demanded items to emerge 
from that study. It emphasized bibliographic reference 
files* but also contained examples of the other two types 
(numerical files and full— text files)* and it concentrated 
mainly upon the well-publicized* nationally available 
material. within this framework* it listed about 40 data 
bases* emanating from about 30 distributers* However* the 
prospect of having the university library acquire and 
provide service from a wide range of data bases gave rise to 
many questions beyond those of mere availability. Xn 
particular* what were the file characteristics of specific 
data bases* and the degree of compatibility between various 
files? What kind of effort did it require simply to read 
the contents of a file? What were the costs of performing 
certain standard processes upon them? What did the faculty* 
as the first group of would-be users* know about data bases* 
and what did they expect to see in terms of service from 
them? How might data bases be ordered* and how were they to 
be handled once they had arrived? Was the documentation 
accurate and/or adequate? In short* the paramount need was 
for project staff to gain experience in working with tapes* 
identifying and coping with problems* both those specific to 
a file and those general to some or all files* Only from 
hard* practical experience would it be possible to start 
designing programming strategies and library procedures to 
handle data bases in the generalized manner called for. 
Phase II A thus offered both the programmers and librarians 
engaged in this project what most other institutions have 
not had* but which we find to be an almost indispensable 
prerequisite to any attempt at acquiring and processing data 
bases in an efficient and systematic manner; that is* time 
to experiment on a fairly modest scale with various 
solutions before plunging into the complexities of real-life 
cummitmsnt* for example through a binding subscription or 
through the purchase of some major data base like the 
0*S. Census of 1970, 

The institute of Library Research has met with 
considerable generosity from distributers who have provided 
XLR with a sample reel upon request even though they 
understool that we needed it solely for general research 
0 rposes and could make no commitment relative to the whole 

ERIC 




But the operative phrase here is "upon request" — — ILR , 
as a research organization, has many contacts in the world 
c puts r i z ed data bases, and can often arrange to have 
courtesy access to a file where a librarian working in a 
library could not. Naturally we have benefited from this 
situation-- it has enabled us to work with several sample 
fries at no cost for the data. The Census Bureau in 
particular is to be commended for issuing a series of sample 
reels (based on the dress rehearsal census in Madison, 
Wisconsin) beginning well over a year in advance of the 1970 
Census itself. A few other distributers have created test 
reals which they send to prospective clients. Most, 

however, do not appear to have grasped the sound business 
sense of doing this as a matter of course. We recommend 
that some appropriate organization of data base purchasers 
such as the American Library Association, or the 



formed 



recently 



Asssciatian of Scientific Information Dissemination 
Centers (ASiDTC) , take the lead in defining the conditions 
which would make the acquisition of a sample tape, plus a 
certain specified level of technical documentation, a 
regular element in the decision-making concerning the 
acquisition of a file. For example, to safeguard the 
investment of distributers, (although this is a sellers* 
market and is likely to remain so for many years) , a fee of 
about flOO might be charged for a test tape, to be applied 
toward the purchase price, or the first subscription that 
eventually ^ resulted. There xs no doubt, of the benefit to 
the prospective purchaser in having a sample reel to "play 
with"; there ought to be no doubt in the mind of the 
distributer that it materially increases his chance of a 
sale. 



The development of computerized information services, 
of course, presupposes the existence of available personnel 
to do the necessary work (which, as far as libraries are 
concerned, will usually include a re-examination of the 
basic library functions— acquisition, cataloging, and public 
service). For this, there is really no alternative except 
the establishment of a library systems office, however small 
it may be to begin with; but university libraries are aware 
of the need for a systems staff for many other reasons, and 
most now have one or ha?e plans forone* 



212^ SfiiLECTXCN ACTIVITIES 
Data Base File 



- ongoing task of XLR during Phase IIA has been to 
maintain the Data Base File, from which the Inventory was 
compiled, and to provide, a broad base of bibliographic and 
published. information about existing and anticipated data 
bases upon which selection and acquisition decisions will 
eventually be made. This collection is probably as complete 



a body of information on generally available data bases as 
presently exists within a university library framework. Ho 
updated version of the inTentoji was scheduled for Phase 
Ilik, though this is certainly a desideratum later in the 
project. In the meantime, a somewhat augmented version of 
the original inventory has been published in Hayes, R . H , 
and J. Becker : fiSSflbook of Data Processing for Libraries, 
Wiley— Becker— Hayes , (1970). 



Like the inventory which was dra 
Base File deals mainly with the bibliogr 
available items — the known and marke 
there are many others. Bibliographic 
emphasized from the start because it 
types to become generally available and, 
a form of information which is already 
via the printed versions, is likely 
concern of any library planning to acqui 



wn from it 
aphic and 
ted data 
reference 
was one of 
since it 
known to 
to be th 
re data ba 



the Data 
nationa lly 
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In the area of numerical files, with the 1970 Census 
leading the way in terms both of scope and probable 
utilization, increasing numbers of socio-economic data bases 
are becoming available* statistical information on aspens 
of urban living such as transportation, sanitation, 
employment, school districting,, land ownership and use, 
economic levels and voting patterns. In addition, there are 
vast quantities of raw numerical data pouring out from 
artificial satellites, and many large hospitals now utilize 
computers for a variety or tasks, among them the gathering 
and analysis of patient— monitor ing data which later can 
assume added value as part of a statistical file of 
biomedical data. 

As examples of full-text data we may point to the 
statutes and constitutions of all 50 states, and the growing 
number and range of literary texts upon which computer— based 
experiments ace being conducted. A good indication of the 
extent of the work now going on in the humanities (also 
covering much of the social sciences) can be found in the 
••Directory of Scholars Active" published periodically in the 
journal Computers and the Humanities, The main body of this 
part of the Finaj. Report is an example of full-text data. 
The t a xt'>. has been keyboar ded v ia a video console , f o rmalted 
fbr publication by Computer using the Format Manipulation 
System v.( ? FMS) described in Appendix G, and can be further 
manipulated by Computer in either the original (keyboarded) 
form or in the formatted w , 

,'r The- important point is that if and- when libraries 
r e a lly t ak evcontcol of this new inf ormation medium , 
numerical and full-text data are going t’o; be! at least as 
important as the bibliographic data, and appreciably more 
difficult to handle , ■ 




A comprehensive inventory of all of these data bases is 
a basic national requirement; one whose execution is likely 
to be large enough to demand a project of its own. 

Tape fi2£SEaS.tStkSS Pile 

In Phase IIA, a Tape Documentation File was created. 
This file has the technical documentation about a data base, 
as opposed to the bibliographic information, and contains 
such items as* 

a, a copy of the original documentation accompanying 
the tape- 

fa. A report by the staff member responsible for initial 
work such as ,, opening ,, the tape and identifying the 
elements of the file structure. 

c. Any relevant printout associated with (b) , e.g. the 
dump, the read program (plus the card deck if 
possible) and a specimen of formatted printout. 

d. Any technical correspondence, e-g. to the 

distributer or to another UC campus, itemizing gaps 
in the original documentation, discussing file 

characteristics, etc. 

e« Later work as it occurs, e.g, a record of what was 
being attempted, plus some sample printout. 

The establishment of such a file by the purchaser or 
processor of data bases will help to guarantee the order and 
continuity of his work with tapes, for example if--as is 
envisaged with CIS — the user will ultimately be able to 
create his own programs to manipulate the data- At XLR, the 
Tape Documentation File was assembled with the additional 
intention of providing technical continuity when the UCLA 
library eventually assumes responsibility for the CIS 
system * 



B Inven to ry 

An inventory of programs and subroutines was prepared, 
which- is reproduced "ras Appendix G- It contains various 
subroutines arid complete programs written or utilized by the 
TLR staff for experimentation with data bases and supporting 
activities- . 7: , 
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WORK IN G WITH DAI 4 BASES 

IiieI® Files 

Throughout Phase IIA it was imperative that project 
staff familiarize themselves with the structure and scope of 
as many different data bases as was reasonably possible. In 
this way both groups of staff, librarians and programmers, 
would attain a sense of what was typical in data bases and 
what was unique to a particular file, and gain insights into 
the whole range of problems surrounding generalized 
management and information services; 
from the smallest technical question to the 
administrative Issue. During the reporting 
reject staff had access to 



files and 



file 

problems that range 
broadest 
period, ILR 
a representative cross-section 
sample files as follows; 



American Institute of Physics — SPIN/G bibliographic 
file (sample) 



American Petroleum 
(sample) 



Institute — bibliographic 



file 



Aspen Corporation— California constitution and statutes 
(sample) 

Atomic Energy Commission — Nuclear Science Abstracts 
(sample) 

„ / 

Bureau of the Census / 

— Census of Households (1960) , 1/1000 Sample 
— Census of Housing and Population (1970), 
first and second counts, test reels 

Chemical Abstracts Service — CA Condensates, January 

Communications of Behavioral Biology (sample) 

Engineering Index, Inc. --COMPENDEX (sample) 

eric (USOE Education Resources Information Center) 

"SfiEESQf lafiSS. to Journals in Education, 
July— September 1970 ~ ““ 

--Research in Educational, complete to August 
1969,;,; and July-September 1970 
--Thesaurus of Descriptors 

Library of Congress 

r-BARC distribution service, 1969- 
— Sub ject headings , 7th edition 

National •Library of Medicine— —HEDLARS ( 



Standard and Poor’s Corpora tion— — CORPUSTAT 

Webster’s New Collegiate Dictionary— -7th edition 

Obviously, the level of involvenant varied from file to 
file, depending upon the type, quantity and availability of 
data and the nature of the work being undertaken, but the 
end result - a fund of experience is more significant than 
how one specific file was treatedi 



Read fisHiiniss / 

A number of read routines were prepared as an exercise 
in understanding the immediate, practical problems which 
occur when a new file of data arrives. Six of these, for 
the American Petroleum Institute file; the 0.3. Census of 
1960 1/1000 sample* Communications of Beh aviora l Biolo gy; 
CQMPENDEX, N uclear S cie n ce Abst racts ; and C A Condensates are 
presented as Appendices A to F to this part of the Final 
Report, 

It should perhaps be clarified that a ’’read routine" in 
this sense is not merely a matter of mounting the tape and 
asking the computer to print out exactly what is recorded 
thereon. This latter process is customarily known (for 
obvious reasons) as getting a "dump" of the contents, and in 
most computer systems there are standard utility programs to 
do it. However, what emerges from a "dump" is a machine 
representation of the information, encoded and perhaps 
doubly encoded, forming an oddly— arranged stream of 
characters unintelligible to anyone but a programmer who has 
read the documentation, A "read routine" is the program 
which instructs the computer to r ead what it finds recorded 
and to print out the information, formatted and readable by 
humans. This in turn involves telling the computer the file 
structure— how the various fields of data comprising a 
record are arranged, where and how to locate each field and 
assign its correct name/ what is the maximum number of 
characters allowed for each field, how to recognize 
print-control characters and suppress them in the printout, 
what to do when a blank . field is encountered, etc, etc. 
This can become a complex and time— consuming operation; it 
can take a skilled programmer a man— week or more of work, or 
anywhere from 10 to 30 runs on the computer, to effectively 
"open" the tape, i, e, bring matters to the point where the 
organized retrieval of information can begin to be 
attempted. 




pocu meat at ion 



The technical documentation purports to be the detailed 
and complete description of the file structure and how to 
access it, and is therefore an indispensable operating 
manual in any data base system. If programs are being 
provided, there will customarily be descriptions of them as 
well as of the file, but since our investigations are 
directed toward the establishment of a CIS which will not be 
purchasing or utilizing distributers* programs, this 
discussion essentially relates to file descriptions. 
Project staff have identified six closely linked criteria by 
which prospective purchasers, in particular library staff, 
can evaluate the utility of technical documentation* 



Promptness. Documentation can precede, accompany or 
follow the arrival of the file to which it refers — there are 
practically no guides to this at present, because the 
1 ''marketing" (in the broadest sense) of data bases is only 
■'us t becoming a recognized, systematic publishing activity. 
Obviously, if the documentation comes early— especially if 
it was unsolicited and is being used as a (supposed) 

advertisement for the product— it has little significance, 
having no immediate referent; if it is late there is 
virtually no progress to be made with the tape until it 
arrives. The too—early provision of documentation may 
signal that it is so readily available because it is no 
longer fully up-to-date, and conversely, the distributer’s 
natural desire to supply the latest documentation may mean 
one has to wait for it after the arrival of the file, or the 
first portion thereof. The next issue to examine is, 

therefore, that of currency. 

Currency, Assuming that it arrives on time, the 

documentation must be up-to-date. The fact that the 
sta te— of — the— art of data b< - e creation is still very 

volatile has led, in our experience, to a tendency to supply 
recently superseded documentation, usually with some 

informal updating such as hand-written alterations in the 
text, brief notes in a personal letter or, more rarely, 
mimeographed insert pages. . This can be attributed to the 
natural logistical inertia of the printed medium ; when much 
concentrated effort has been expended in getting out a 
document , i t is both yeconomicaliy and psychologically 
■ 'iinpossible to avoid treating it as final and authoritative, 
-^•^^'•■dealing with any amendments as.; minor errata and corrigenda , 
. best handled in the'" context of the overall correctness of 
fCM the main document. Itseems clear that machine methods of 
Technical Document storage, manipulation -and reproduction 
could well constitute part of the solution to this inertia. 

: eric 
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*££ii£3£i * 
as far as the 
''Is it also 
should be, A 
a simple c 
transposition) 



If what is supplied is 
distributer is concerned* 
accurate?” — and it is 
comment on typography may 
lerieal error within a 
usually causes no trouble 



current information 
the next question is 
self-evident that it 
be made here; while 
word (e.g. letter 
* the same type of 



error within a number (e.g. digit transposition) is almost 
invariably undetectable until an inordinate amount of 
checking has been done* by which time a great deal of time* 
money and effort has already been lost. If the blocksize is 



given as 5600 and it should have read 6500* the computer may 
print out the information (assuming all else in the read 
program is working) minus the last 900 characters, or it may 
not be able to begin. 



£ 2 J£lSiSIlSss, Even if the documentation is on time* is 
current and is accurate* one must always try to determine 
whether what is supplied is complete. In fact, experience 
so far shoes that simple failure to itemize or mention 
certain of the details that the programmer needs to know is 
a more prevalent fault that the outright mis-stating of 
them* and is therefore potentially a larger problem for the 
library. Up to this point* creating data base file 
structures and using the computer to retrieve from them has 
been a sufficiently exclusive activity for communication to 
be built on unspoken assumptions* and the natural tendency 
not to bother documenting something you assume everyone in a 
relatively small field takes for granted can lead to 
horrendous problems at the point of acquisition. For 
example* when an ILR staff member comments: 



The file has a tape mark at the beginning where 
one no rm ally ex pects to f in d a label or nothing at 
all. This was not mentioned in the accompanying 
documentation and led to some waste of time and 
effort... (Emphasis ours) 



The point is not so much whether she was correct in her 
assumption (perhaps the writer of the documentation would 
have disagreed that "one normally expects to find a label or 
nothing at all") but that she had to make the assumption. 
That particular file caused certain additional problems* and 
the tape was ''opened" in 29 runs at a machine cost of 
$46,08. Add to that the full— time equivalent of about one 
week of work by an experienced programmer (for whom a 
reasonable average salary -estimate is $1100 per month) and 
the true cost is revealed to be in the region of $300,00 
which quite clearly cannot be contemplated for each of the 
1 5t20 data bases that it is postulated a CIS would acquire 
in its. first fei, years of opera tion . 



O 
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Stylis tic C la rity . By this we mean such things as 

terminological precision, systematic presentation, avoidance 
of ambiguities, etc. This too can cause disproportionate 
trouble. Such statements as M the file consists principally 
of...' 1 without specifying what else may be present, are 
needlessly vague; and the following from some early ERIC 
documentation shows a confusion over the use of the words 
"record" and "block" that can cause trouble for the 

programmer t 

ERIC Report Resumes are stored on magnetic tape (9 
channel, 800 bpi) in accession number sequence in 
the form of one or more IBM System/360 Operating 
System variable length records. Each ERIC Report 
Resume equ als one record on magnetic tape. These 
records are grouped in variable length blocks 
which have a maximum length of 32,000 bytes. 
(Emphasis ours) 

The point of this illustration is by no means to 

castigate the ERIC system (which, taken all round, is 

actually one of the better ones to work with) but to draw 
attention to one perennial source of confusion in the effort 
to describe for someone else exactly what is on a reel of 
magnetic tape, and that is the widespread uncertainty about 
group of concepts relating to the units of mechanized 
information storage (especially on magnetic tape) , such as 
Phlfl cal ES£££ds, logical records, bl ocks , bl ocksize , 
bloc king factors, etc. Many people — including programmers 
and system analysts— -do not seem to be quite sure that you 
understand these items to mean what they understand them to 
mean. The prevalence of the IBM System/360 computers has 
fostered some standardization of concepts, but it is a 
circumstantial, rather than a theoretical, phenomenon, and 
could just as easily evaporate with a change in the market. 

Listing a data element called “Access Words*', and 
neither specifying that none occurs in the entire test tape 
nor explaining how "Access Words'* (were they to appear) 
would differ from subject terms; repeating the "Subject 
Heading and Subheading" field, the first time linking the 
subject term to its subdivision by a dash and adding a 
document number into the field, the second time linking the 
elements with a comma and omitting the document number 
(which also appears, unannounced, at the end of every 
abstract) ; referring in the documentation to a "Card Service 
Category Code" whereas it appears to be designated "Sales 
Code" on the machine record itself— these are - typical 
examples of the imprecise and indeed sometimes careless 
nature of much documentation as presently found. Any 
librarian ,.' who. must deal with.it will be disappointed at the 
gene r al laxity-- especially s ince relatively la rge sums of 
money are invariably at stake— and, incidentally, vis a vis 
reference files, will be particularly amazed at the variety 



of j|d hoc methods of giving a bibliographic citation in the 
record, representing decisions already taken which will be 
practically impossible to correct retroactively in time to 
come, when library networks have made standardization of 
formats, of catalogs or holdings lists, and of documentation 
an economic necessity. 



Level of fo rmal it y , As indicated above, significant 
and useful material relating to the technical description of 
a file often appears in an informal fashion; as comments 
embedded in a personal letter, as hand— written addenda, or 
as loose-leaf photocopies of what are obviously internal 
work-sheets, flow-charts, translation tables, etc. It 
appears that the best compromise between formality (implying 
here delays and also rigidity) and informality (implying 
here scattered, inaccessible, and sometimes incoherent 
information) would be to have a basic printed document in 
loose-leaf form, highly modular in arrangement, and an 
agreement to keep it updated. As noted above, use of the 
computer to produce and update this document should be tried 
where feasible, A trend toward loose-leaf manuals is 
apparent, but since most never get updated, that special 
virtue is often lost. 



Field 5.*33£2 



A brief study was done at HR to determine whether a 
common set of data elements coul ^ be defined for 
bibliographic files; five serial bibli " graphic data bases 
and their documentation (CA Co nd ens a te s , Co mmun icat ions of 
Behavioral Big logy, COHPENDEX, NSA Entry file, ERIC Report 
Resumes) were examined. From these* alone, it was possible 
to list 45 different bibliographic elements (see Figure 1) , 
and if the MARC record for either serials or monographs had 
been added, the total would have been |about 115 (MARC for 
monographs has over 50 fields currently in use, and another 
20 defined) . Four of the files contained at least one 
bibliographic element that appeared to be unique, although 
this is often difficult to ascertain precisely, because 
frequently the "same'* element, under a variety of names 
(e.g. Subject terms. Keywords, Selector, terms. Descriptors, 
Access words. Index terms, etc.) has a slightly different 
structure, giving it a slightly different potential for 
retrifival. 



Machine— readable bibliographic files are notable for a 
propensity to give- several numbers to the citation: one or 
more accession numbers, an abstract number, citation number, 
report number, etc. The designers ff^of MARC, found it 
profitable (even for monographs) to divide the many possible 
numbers into Control Numbers and knowledge Numbers. The use 



9 numbers is complemented by a series of codes, usually one 



re 1 



Bibliographic Data Elements 



vol# 
author 



issue nos. 



(society, date. 



month 

year 



Data Elements 
Abstract 

Abstracting journal; 

Address of principal 
Affiliation 

Assignee (corporate owner of patent) 
Author (s) , corporate 
Author (s) , personal 
Availability (distribution) 

Citation number 
Coden 

Conference preprints 
pages, and price) 

Contract number 
Corporate code 
Date of publication. 

Date of publication. 

Date (s) (patent) 

Descriptor terms 
Drop note 

Field group (codes) 

Field group (words) 

ID number (accession 
Journal title (full) 

Journal title (abbr) 

Journal volume number 
Journal issue number 
Language 
Microfilm code 
Pagination (page numbers, 

pp. 12-220) 

Pagination (number of pages, 

12p.) 

Pa tent number 
Place of publication 
Price 
Publisher 
t Report number 
Sales - 'codes 
Second ary number 
Security code 
'•Series, /-title 

fShort; title 
Source article .. .. 

Source of inf oimation 



CA 


CBB 


COH 


NS A 


EBIC 
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X 




X 


X 








X 


X 
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X 








X 
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X 




X 
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X 








X 






X 








X 






X 
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X 
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X 
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X 








X 
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X 



number) 



x 

x 

X 

X 



e*g 



x 

X 

X 

X 

X 

X 



X 

X 

X 



X 

X 

X 

X 



X 

X 

X 

X 



e.g. 
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X 




X 






X 






X 

V’ 


X 




X 

X 


X 


X 


X 




X 


X 





Ti tie (in Engl ish) 
Tran si at ion no t e ; 
nicular title 



x 

X 



X 

X 



X 

X 



X 

X 

X 
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digit or character in length# specifying the existence or 
otherwise of various conditions. For example# a "Security 
code” might allow the binary choice of "Yes™ or "No" to the 
implied question "Is this document for general 
distribution'?" or it might allow a graduated series of 
responses to reflect several different levels of 
availability. The numbers# tags and codes spread throughout 
a machine— readable bibliographic record (which frequently 
serve to make the machine record a significantly different 
entity from its printed counterpart) are the means to take 
advantage of the computer’s ability to do very precise 
retrieval and high-speed sorting. Any field which is a 
computer- identifiable unit# a "data element"# can be 
retrieved from each record# and similarly any field can be 
designated a sort key and used to generate subject Indexes# 
author lists# title lists, chronological lists# etc. 

It was rapidly decided that attempting to establish a 
canonical set of names for data elements having different 
names in different files was not a feasible approach to 
take. Subsequent work (at CCN) was therefore oriented 
toward creating individual read routines for each file. 



Test Profiles 



Chemical Abstracts* The CA C onde nsat es Search Service 
at UC Riverside (see part 2 of this Report) was utilized in 
order to give UR and Library staff some practical 
experience with constructing profiles# analyzing the results 
of searches# refining the profiles# and with gathering 
statistics on computerized information retrieval. In the 
CIS Seminar of April-June 1970 (see part 6 of this Report) 
participants had submitted searches as a class exercise# and 
these were first analyzed be a Research Assistant# who did 
the corresponding manual search for each question. Since 
this was a sample of only eleven profiles# most of them 
written by non— chemists and therefore of a general nature# 
it was not the purpose to do anything more than use these 
profiles as a broad indication of how the system functioned# 
and as a guide to the keeping of useful statistics on 
computerized searching. The following is a summary of the 
analysis (it should be remembered that this was Chemical 
Abstracts Service's old format# prior to the introduction of 
the Standard Distribution Format). 



Phase li Analisis of El eve n P ro f lies and Hits. 



The profiles were prepared by librarians participating 
in a seminar on • computerized information services in 
libraries. The seminar was ;fceld : ;dur ing the Spring Quarter , 
1970# at the DC. LA Schoo 1 f oi Library Service# and it was 



"onducted by the Institute of Library Research. 



ERIC 
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The profiles were run against tape 72:15 of 
Conden sa tes at the Riverside campus of the University of 
California under the direcion of Hrs. K. Forrest. 

Examination of each profile proceeded by: 

a. Reading the question, 

b. searching the keyword subject index, 

c. Reading the abstract and deciding if it answered the 
question , 

d. listing those abstracts deemed relevant to the 

profile, and 

e. Reading the abstract (hard copy) of all hits made by 

the computer and recording what possible 

term-matches were satisfying the profile. 

There were 4,916 entries on the tape used (74577 to 
79493) . Based upon information found in Pr ep ar atio n of 
S ea rch ProliiSS (a Chemical Abstracts Service publication) , 
an assumption was made that hits result from term— matches of 
the profile and terms in the title, and/or the keywords 
stored on the tape. Keywords were understood to be 

information containing words which appear also in the 
abstract. Many were not identifiable in the hard copy of 
the abstract. This confirmed a report* from the Computer 
Center at the university of California at Santa Barbara in 
September, 1969: 

Only the title and a selected number of keywords 
appearing in the abstract are reproduced on tape, 
but not the entire abstract. ... This relationship 
between keywords and the abstract is not always 

apparent. It was found repeatedly that while a 

search term was not found in the abstract, it 




as evidenced 
ut. since the 
a term-match 
v presents a 



problem. 
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*Beliomy, Fred arid Hong, H.! 
Abstracts SDI Sy stem . 
Barbara, September 3, 




Keywords were not necessarily taken from the abstract, 
bat were sometimes the result of an abstractor's or editor's 
intellectual interpretation of the content of the abstract. 
The problem, as viewed from the standpoint of the user or 
the profile analyst, was; "Is there a significant 
difference between the retrieval from the tape (title and 
selected keywords) and retrieval based on term-matches with 
the title and abstract found in the hard copy? How does 
machine retrieval compare with manual retrieval using the 
subject index?" For an example of the various possibilities 
of retrieval, it might be interesting to consider the 
profile for question no. 306, "Information on nutritious 
diets for pigs". Seven hits were made for which no 
term— matches were found in the hard copy of the abstract; 
term-matches w ere found with the keyword subject index. 
This indicates that the keywords included for this series of 
abstracts were not taken entirely from the abstract. Three 
abstracts found by manual search were not ' retrieved by 
computer even though terra- matches were found in the title or 
body of the abstract. Seven hits were made by the computer 
which might be judged relevant, which were not found in the 
manual search. To generalize, 84% of the possible hits were 
made by the machine; 33% of the hits made were not traceable 
to term-matches in the abstract; 1.2% of the possible hits 
were not made by machine, but judged retrievable (containing 
parameters to satisfy term— matches with the abstract) • 

Profile 316 provided an interesting study in 
interpretation and profile construction; "Biochemistry and 
pharmacology of aggressive behavior". The Besearch 
Assistant first interpreted this question to mean agressive 
behavior in man, while the individual who constructed the 
profile did not exclude other forms of life. When assisting 
soiaona in conatruction of a prof ile # At is necessary to 
ascertain precisely what is actually desired. Profile 314 
asked for "Information on the accumulation and effects of 
Strontium 90 in mammals". The ‘descriptors (in two 
parameters) related only to mammals as a unit; an 
interpretation of parts of mammals affected by Strontium 90 
(e.g. livers, bones, etc.) would have resulted in a larger 
group; of retrieved abstracts. : 

The keyword subject index; found on the pink pages of 
every issue of C heiica 1 A bstrac ts. is not always easy to 
use. For example; an abstract pertaining to profile 302 
("How do widely used synthetic chemicals released into 
lakes, streams, oceans, affect marine animals used as 
food?") was missed (77865a) because the keyword subject 
index access was only by "mercurals grain toxicity" and 
•'grain toxicity mercurals" . There was no Recess by in a «ex 
terms such as "pollution" ; "toxicity" , or "marine animals" 
even though the abstract did deal with these concepts. Bote 
♦h** t the ■ a bstr act was retr i eved in the machine search. 
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nly one entry for abstract number 77944e# under 
birds”. Abstract number 76327u cannot be found 
s'* # but only under "dietary pig yeast**. Hanual 
of the keyword subject index, therefore# does not 
retrieval of all relevant abstracts relating to a 
How do most people conduct a search of current 
Chemical Abstracts? For those whose interests are 
to a narrowly definable field {within the cA 
scanning of the titles of a section is the most 
For those whose interests are of 

plinary scope# the keyword index must be 
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descriptor combinations 
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tion; about 50% of the 
files 304 and 306 did not 
ieved 65% of the hits. For 
or combinations (over 20) # 
the 330 combinations for 
ot retrieve any hits <95%) ; 
a total of 17 (41%) ; each 

retrieved one hit (5.9% 
based upon examination of 
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72:15 was used to select a 
aphy^-which would yield a large 
ious terms relating to the subject 
e number would not exceed 100. The 
its interdisciplinary application 
of Chemistry. Prior to the machine 
were listed which contained 
hard copy and the profile. A list 
s which did not lead to a match 
profile# but which (because of the 
generate proper term— matches with 
ne record. • • • 



Of a possible total of 97 hits# the machine search 
retrieved 74 (76%) . In Figure 2# list B contains the number 

of ^abstracts predicted to be hits and of hits actually made; 
lrst'^ ; t?^shO¥S. the number of abstracts with no precise 
term— matches but which were predicted to be hits on the 
basis •' of keywords on the tape ; andffllst D has the number of 
abstracts which were retrieved by machine and not. manually. 

: . Of the 74 hits made by’ term-matches in the machine 

search# 7 could not be related to tbeifiarl copy. Of the 67 
remaining# 19 were retrieved by one logical combination of 
descriptors# 14 byr another;, 11 by > third. There ; were 18 
combinations possible, 8 of which produced no hits. 
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Figure 2 

CA Condensates Test 



A. Possible hits . 

Hits rstrisTsci 

B. Predicted hits. 

Hits made (from predicted group) ••••>>•• 3^ 

Hits not made (from predicted group). ..... 23 

C. Predicted hits with no term— match in abstract . 3 

Hits made with no term— match in abstract. ... 3 

D. Hits not made manually - no term-matches. ... 4 

Hits not made manually — wit h ■ ter m— matches . . . _8 

Total hits sot predicted. ......12 
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Efluca^tion— Psj[choloc[i Data * In August, and September, 
1970, preliminary planning and committee work were 
undertaken towards the establishment of a test 5DI project 
to be run by the Education and Psychology Library working 
with CIS project staff. Arrangements were made for a 
controlled group of about 10 to 15 faculty from the 
departments of Education and Psychology, and negotiations 
began with the American Psychological Association to obtain 
a part of the Ps ycho lo gical A bstra c ts data base. At the 
same time, the DCLA Library provided funds to purchase 
3-month update tapes (July- September 1970) of the two ERIC 
files. Research in Educ a tio n (RIE) and Curre nt I ndex to 
Jour nals in Edu ca tion (CIJE) ($50 each) . As a trial 

procedure, they were ordered using the library*s regular 
acquisitions routines. 

Since the CIS project has already developed a versatile 
set of reference retrieval programs for use with the ERIC 
(RIE) data base, an SDI experiment in this field was seen as 
an appropriate vehicle by which to gather information and 
experience on several points: 

a. How will the faculty respond to the availability of 
this type of service from their disciplinary 

library? 



b. Can the disciplinary library at this point begin to 
assume a primary role in selective dissemination 
activities (i.e., did the first CIS Seminar lead to 
positive, tangible results) ? 

c. How will the Library respond to the need to order 
files of data on tape under carefully observed 
conditions? 



d. How successfully can these data bases (three files 
from two sources) be processed and searched for the 
benefit of one fairly homogenous intellectual group: 
what problems can be brought to the surface which 
are likely to be of a general nature in the handling 
of multiple data bases — problems of file 
manipulation, profile construction, user acceptance, 
etc. ? 



No adequate response was obtained from the American 
Association concerning Psyc ho lo gical 

Ab strac ts , and so the experiment proceeded with the two ERIC 
flies. The programs for searching HIE, mentioned above, had 
been designed as modular search programs, and they were used 
to process .CIJE after only minimal changes. 
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will probably be able to handle tape acquisitions with only 
small procedural changes, one of the most pressing of which, 
the checking of incoming material so that funds can be 
disbursed, has been worked out satisfactorily in conjunction 
with the Systems department. Discussions on acquisition 
questions are continuing, as are discussions on the 
cataloging of ERIC tapes. Some trial cataloging of the ERIC 
material was circulated internally for discussion, and used 
as an excellent "real-life" _ example in the sessions in 
cataloging in the second CIS Seminar (see part 6 of this 
Report) . The quarterly availability of ERIC tapes, although 
it has not been a problem in this initial phase since there 
were all kinds of other "set-up tasks" to be done, threatens 
to be too infrequent for a regular well-organized SDI 
service. The user can, after all, see the printed copy of 
the journals every month. The common subject terminology is 
a great asset to the system as a whole, although there are 
certain difficulties with the use of the thesaurus. Also, 
it was not clear at the outset whether the tap€5 version of 
HIE contained the entire contents of the printed version 
(ED- and EP-nurabered documents) or only the report resumes 
(ED) : in fact it was the latter. Owing to the necessarily 
small number of staff engaged upon this trial service and 
the quarterly availability of the tapes, feedback has been 
somewhat slow ia accumulating, but faculty response to the 
search results was encouraging — in most cases, a high rate 
of relevance was obtained (about 70 %) after one or two 
modifications of the profile. 
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III. COMPUTER ASPECTS 

I NTRODUCTI ON 

Although Phase IIA has been concerned primarily with 
the design of the overall CIS system - administration, 
procedures, software and hardware — a considerable amount of 
experimental programming was performed to support this 
effort. 

initially, this programming was performed by the ILR 
s *- a f£ primarily to gain a familiarity with various aspects 
of the software and hardware problems. Key areas of concern 
were* 

a. Hardware 

b. Operating systems 

c. Programming languages 

d. Software system programming strategies 

In mid- 1970 several members of the ILR staff were 
transferred to the Campus Computing Network (CCN) and the 
emphasis shifted toward experimental programming to clarify 
issues in the actual design of prototype CIS software and 
production programming to support experimental operation 
with the CA Condensates tapes. 



H AR DWARE 

Phase IIA began with the premise that the CIS software 
system would be centered around an IBM 360/30 located in the 
AJuivsrsity Research Library and would have available the 
services of an IBM 360/91 at the computing center for tasks 
beyond the economical capabilities of the library *s 
computer. . 

■ : However, with the advent of a university-wide "> budget 

squeeze , the library has scaled down its planned computer to 
: anIBM 360/25. This in turn has shifted the emphasis of the 
CIS software system to the IBM 360/91 with the library's 
computer performing as a small stand— alone computer for 
those tasks which it can perform, or as a remote job entry 
and input/output station to the IBM 360/91 for those tasks 
requiring the larger computer . 




OPE RATING SYSTEMS 

The CIS software system is being designed to operate on 
the IBM 360/91 at the Campus Computing Network at UCLA, 
This computer uses the IBM Operating System/360 (OS/360) 
with the MVT (Multiprogramming with a Variable number of 
Tasks) option. 

While the CIS software systei was still thought of as 
being centered around an IBM 360/30, however, it was 
recognized that it would be highly desirable to have 
compatible operating systems on the two computers. This 
would allow the computers to share common programming 
languages, programs, data set labels, etc. For this reason, 
a short study was made to determine if it was feasible to 
use the IBM Operating System/360 with the PCP (Primary 
Control Program) on a 65K IBM 360/30. 

This study concluded that it was indeed feasible, and 
went so far as to design a possible operating system 
configuration for the combination. The impossibility of 
acquiring an IBM 360/30, however, has reduced any interest 
in this study to a purely theoretical level. 

The currently envisaged IBM 360/25 will certainly use a 
software system supplied by CCN when operating as a remote 
job entry and input/output station to the IBM 360/91, This 
software will probably be very similar to that currently 
available to support the two local IBM 360/20 ' s presently 
operating in similar capacities. Operating as a stand— alone 
computer the IBM 360/25 will probably use the standard IBM 
Disk Operating System (DOS). While this is an adequate 
operating system in its own right, its incompatibilities 
with OS/360 on the IBM 360/91 will cause many problems. 
Amongst these are: 

a. Incompatible tape and disk data set labels, A data 
set written by one system cannot be conveniently 
read by the other. 

b. Incompatible programming languages. Programs 

written for one system cannot be used directly on 
the other* 

c. CCN personnel, while highly competent with OS/360, 
have little or no experience with DOS. 

These factors combine to push the IBM 360/25 into the 
background and assign to it the primary status of remote job 
entry and input/output station at this stage of the CIS 
project. This current status does not preclude greater 
utilization of the IBM 360/25 at a later stage, particularly 
during actual operation, when library personnel will become 
j"sponsible for the project, 
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iSfiSSAHMING LANGUA GES 

The selection of a programming language for a given 
computer application has been, historically, largely a 
matter of intuition. ILR felt that language selection for 
the CIS project should be placed on a more rational basis. 
To this end, a set of criteria for language selection was 
established and a number of available languages were rated 
subjectively by these criteria. Then a selection was made 
on the basis of the ratings. Happily, both intuition and 
language ratings agree on PL/I and Assembler as the 
programming languages to be used for the CIS project. 



Crit e ri a 

The criteria for language selection fall into three 
categories : 

a. Language — based on the definition of the language. 

' b. Implementation — based on an implementation of the 

language. 

c. Environment - based on the (local) environment. 

These categories and the criteria within them are not 
entirely independent. it is hoped, however, that any 
interactions have not significantly biased the conclusions. 

In addition, the nature of the application acts as a 
constraint in rating a given language by a given criterion. 

Criteria • The first six of these criteria are 
loosely grouped under the heading '’Programming Time" in that 
a high rating will tend to reduce the amount of elapsed time 
required for programming. 

1 ) Problem oriented notat ion. Is the notation (syntax) 
of the language "natural" for the problem at hand? 

2) R an ge of op e rat ions. Does the language include a 
sufficient range of operations to handle the problem 
at hand, or is it necessary to resort to programming 

/J! tricksl': b 

3) Vol ume of coding . Is the language sufficiently 

concise to minimize the probability of errors due to 
sheer volume of coding? ^ 

4) Ease of learning. Is the language, or a subset of 
the language, easy to learn and apply to real life 




5) |ase of coding. Is the language easy to code, or is 
it prone to difficulties due to complexity, coding 
restrictions or other reasons? 

h) Jsse of IS in t 0 na nc s ■ Is a completed program easy to 
document, understand and modify? 

"7) J2il£lii£®» To what extent is a knowledge 

of the machine and/or machine language required to 
program in this language? 

®) Interaction with OS. To what extent are the 
services of the operating system available in the 
language? 

9 ) Abil it y to segment code. Does the language allow 
the segmentation of programs into smaller units, 
subroutines or functions? 



I»£lej®ntatign Crit er ia. The first three of these 
criteria are grouped under the heading "Efficiency" and 
include generally recognized measures of the efficiency of a 
language implementation, 

1) time. Does the program compile rapidly? 
This criterion declines in importance as the ratio 
of execution time to compilation time increases. In 
a production environment its importance is minimal. 

2) Execution _5.im® • Is the object code efficient in 

terms of execution speed? This criterion increases 
in importance with the maximum execution time for a 
program. A 100% increase in time due to 

inefficiency may be tolerable for a one minute 
program and intolerable for a one hour program. 

3) Qbject size • Is the object code efficient 

in terms of the amount of storage used? This 

criterion decreases in importance as the amount of 
available storage Increases. 

The next four criteria are grouped under the heading 
"Debugging" and serve to measure the degree to which the 
implementation assists in the debugging of programs. 

*0 Comp il ation diagno st i cs . Are the compile time 

diagnostics complete; accurate, understandable and 
indicative of the actual problem? 

5) Execution di ag n ostic s. Are the execution time 
diagnostics complete, acc urate, understandable and 
indicative of the actual problem? 



6) C ompila t j.QO llstingg. Are the listings provided by 
the compiler complete and informative? Do they 
provide adequate documentation for a finished 
program? 



7) Knowledge of m achi ne. To what extent is a knowledge 
of the machine and/or machine language necessary for 
successful debugging? 



8) IfiiaEfSSS wit h languag es. How easily can routines 
written in this language use or be used by routines 
written in other languages? 



Environment Criteria. 

1) I valuation of p rior use. What is the overall 
opinion by users of this language as to its 
suitability for the problem at hand? 



2) Su ppo rt by IIR. To what extent are ILR staff 
members familiar with and able to use or offer 
advice in the use of this language? 



3) Su pp ort by CC N . To what extent are CCN staff 
members able to use and offer advice in the use of 
this language? 

* 

4) Sup por t by v en dor . To what extent is the vendor 
able to supply documentation, compilers, and advice 
in the use of this language? Is support available 
for the isolation and correction of errors or bugs 
in the implementation of the language? 

5) P rogram mer av a ilabi lit y. Are programmers proficient 
in this language available for hiring? 

*) Convertibility. T o what extent are programs in this 
language convertible for use on other computers of 
the same series or to other dissimilar computers? 




Candidates 



In making these ratings six languages were selected as 
candidates £ 

a • A LG 0 L 
b. Assembler 
e, COBOL 

d. FORTRAN 

e, PL/I 
£. RPG 

Several of these languages are available in more than 
one implementation (Level) on the IBM 360/91, For these 
ratings, the most powerful implementation available has been 
chosen. 



(Level F) 
(Level G) 
(Level F) 
(Level H) 
(Level F) 
(Level E) 



Rat ings 

Each of the six candidates was rated on a scale of 1 to 
5 (poor, below average, average, above average, excellent) 
for each of the twenty-three criteria. The result of this 
process is given in Figure 3, 



Analysis 

A superficial examination of the ratings would rank the 
languages in the order: PL/I , FQRTRAH, COBOL, Assembler, 
ALGOL, and RPG, 

A more critical examination of the ratings, effectively 
applying weights to some of the criteria, reveals additional 
information. Perhaps the most important criteria are "Range 
of operations ,, and "Evaluation of prior use”, both 
indicative of the appropriateness of the language to the 
problem. By these criteria ALGOL, FORTRAN, and RPG are 
judged to be inappropriate. 

The ratings of PL/I, in general, dominate those for 
COBOL, particularly in the "Range of operations" , 
"Debugging", and "Environment" criteria* The only areas 
where COBOL dominates PL/I are "Execution time", "Object 
program size", and "Convertibility*!., These observations 
indicate that PL/I is more ' appropriate for program 
development, and COBOL is more appropriate for operation. 

The only areas of superiority for Assembler are in 
"Range of operation", "Execution time", and "Object program 
size"; These indicate that Assembler would be most 
appropriate for operation and in this area dominates both 
COBO L and ; PL/X i 



Figure 3 

Programming Language Ratings* 



RPG 



ALGOL Assem COBOL FORT PL/I 



Language 

Programming time 

Problem oriented notation 3 
Range of operations 1 

Volume of coding 3 

Ease of learning 3 

Ease of coding 3 

Ease of maintenance 3 

Knowledge of machine 4 

Interaction with 03 1 

Ability to segment code 3 

Implementation 

Efficiency 

Compilation time 3 

Execution time 3 

Object program size 3 

Debugging 

Compilation diagnostics 2 

Execution diagnostics 2 

Compilation listings 2 

Knowledge of machine 3 

Interface with languages 1 

Environment 

Evaluation of prior use 1 

Support by ILK 1 

Support by CCM 1 

Support by vendor 1 

Programmer availability 1 

Convertibility _3 
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1 4 3 

5 3 1 
1 3 3 

13 4 
1 3 4 
1 3, 3 

14 4 
5 2 2 
4 3 4 



2 

5 

5 

4 

1 

4 

1 

4 



3 
2 

4 

3 

2 

61 



3 

3 

3 

3 

3 

3 

4 
3 



2 

2 

2 

4 

2 

_4 

69 



3 

4 
4 

3 

3 

3 

3 

4 



1 

3 

4 
4 
4 

_3 

74 



76 



48 



* Rating Scale: 5 - Excellent 

4 - Above average 

h-iv: V ;••••.. 3 — Aver a g e ; - v • • 

average 

1 - Poor 
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Selection s 

The development of the software system for CIS will 
take place at CCH on the IBM 360/91, Due to the 
appropriateness of the language to the problem area, the 
relative ease of debugging and the available support for the 
language, PL/I was selected to be the primary language 
during system development. Assembler language was selected 
for use (sparingly) when needed to supplement the “Range of 
operations 1 * and “Interaction with OS” facilities of PL/I, 

During later phases of the project, it will probably 
prove desirable to rewrite the software system in Assembler 
language to take full advantage of its excellent ratings for 
operational efficiency. 



SOFTWA RE SY STE M PROGRAMMING STRATEG IE S 

There are a number of overall programming strategies 
possible for the CIS software system, Duing Phase IIA, ILR 
staff examined several of these in varying depth. 



Custo m Programm ing 



This involves acquiring or writing a program for each 
operation to be performed. Each select step, report step, 
etc, would be a separate program written for the purpose at 
hand. The read routines in Appendices A to F are examples 
of this approach. Custom programming is unquestionably a 
slow and expensive process and fails to recognize common 
elements of programs to perform similar functions. It would 
also lead to a proliferation of programs and procedures for 
their use. 



Dat a Base Conver sion 

This involves defining a canonical form for each type 
of data base (bibliographic, numerical and full— text) and 
converting data '.bases to canonical form. Custom programs 

may be used for conversion. This approach is not 

unreasonable for bibliographic data bases (e. g. Nucle ar 

Science A bstracts and/CA C onde nsates) ^wfaiobi are inherently ■ 
similar in nature, but it becomes; impractical wbehj ^applied 
to numerical data bases (e.g. the 1970 Census and COHPUSTAT) 
or full-text data bases (@. g. the California Constitution 
and Webster *s Dictionary ) due to gross differences in the 
data bases. IBM *s TEXT— EAC system is based on converting 
bibliographic data bases into a form that can be used by the 
system for Selective Dissemination and Retrospective 
searching and other functions- 




Modular grg gram sla g 



This involves identification of commonly used 
procedures (e.g, file read) which are dependent upon a 
particular data base. These procedures would be written for 
each data base, as necessary, and combined with program 
skeletons to form complete programs. The reference 
retrieval program presented in Part I of this Final Report 
is an example of this approach. 



Langua ge Dev elopme nt 

This involves the development of a programing language 
{perhaps based on PL/I) to handle files and file 
descriptions in a straightforward manner. The use of such a 
language would not reduce the number of programs to be 
written (as opposed to Custom Programming) but Mould 
simplify the development of individual programs. This 
approach is used by the General Electric Company in their 
Integrated Data store {IDS) system. IDS uses a 
non-procedural language to define file structures and a 
procedural language imbedded in a host language (e.g, COBOL 
or FORTRAN) for file processing. 



Pro gram Gen era tor D evelop me nt 

This involves identification of commonly used 
pi acedures (e.g, sort, report) and the delevopment of 
program generators to generate programs for these procedures 
from specifications of files and operations desired. This 
approach is exemplified by IBM's Sort/Merge and Report 
Program Generator. These programs, however, do not have 
sufficient capabilities for the the purposes of this 
project. 

Da t a Base M ana g eme nt .System 

This involves identification and acquisition of an 
existing Data Base Management System of sufficient 
capability to serve the purposes of this project. Dozens of 
these systems are available from universities, computer 
manufacturers, independent software firms and others. To 
date, none has been proved suitable for CIS, This approach 
is quite reasonable, however, if a suitable system could be 
found, ' . 



fiSSS S ystem D evelopme nt 

This involves the development of a Data Base Management 
Systen with the capabilities needed* Such a system would be 
basically similar to existing systems of this nature, but 
would incorporate features necessary for the CIS software 
and would be oriented toward efficient use in the CIS 
environment. This is the approach that has been selected 
for the CIS project. 
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APPENDIX A 
API FILE: 

READ ROUTINE AND OTHER INFORMATION 

* 



(K, D, Reilly, December 1969) 



BACKGROUND AND RECORD FORMAT 



The API file, a sample of approximately 5000 records of 
which is available at UCLA, is produced by the American Petroleum 
Institute, The file is a strict "bibliographic” file consisting 
of the following type of information: 



document number 
author (s) 
title 

descriptors (major and minor) 
bibliographic reference 



The Initial portion of each record consists of record-control 
information which will allow the programmer to compute the 
relative location of any field In the record- (Variable -length 

fields in API are a variable number of fixed-length segments.) 

Size is specified in terms of the numbers of these segments. The 
segments sizes used are 36 characters (for authors and descriptors) 
and 66 characters (for titles and the bibliographic reference) . 

In addition to this the total record size is given in a special 
form: a number which must be multiplied by six (6) in order to 

get the total number of characters in the record. 

Additional description of the record format can be found in 
API Information Retrieval System: Computer Manual prepared by 

Allen J. Humphrey, (New York, API , n.d.) 



READ ROUTINE FOR API TAPE 



The purpose of "read” routine for a tape is to "expose" some 
or all of the fields of each record so that such (field) data can 
be manipulated (searched ^ printed out, etc , ) by programs (subroutines) 
i n conjunction with the "read" routine. 

ERIC 
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An example of a "read” routine in PL/1 is listed below. This 
particular routine was used in conjunction with a sub-routine 3 
called OUTPUT (for printing out the fields of the record in a 
particular format, 3x5 cards) and thus many of the fields are 
DECLARED with the EXTERNAL attribute for inter-routine communication 
purposes. The important features to note are: 



1. The (elementary) structure that can be used with API 
records. Identifiable in this structure are: 



which, w’hen converted to numerical from 
character form and multiplied by 6. gives 
the record size. See the field RECL below. 

the document number assigned by the API ptaff. 

(The document number is a code that tells 
something about the nature of the data represented 
in the record (e.g,, a literature item, etc.) 
and the sequence number for the particular 
year in which it was indexed. The programmer 
who wishes to make use of this data must refer 
to the API Information in the Computer Manual. 

a character which when converted to numeric 
form and multiplied by 66 gives the number of 
characters in the TITLE field. See TTL below. 

a pair of characters which when converted to 
numeric form give the number of authors. See 
the NAUT field below. 

UIBREFLEN a single character, which when converted to 
numeric form and multiplied by 66, gives the 
length of the bibliographic field (BXBREF) . 

See BIBREFL field below. 

Special consideration must be given to the two fields REC1 and 
REC2. If these character strings are concatenated and then converted 
to numeric form they give the number of descriptors in the record. 

See the MDBS and NDE3 fields below. 

2. This particular routine goes through the record from the 
front and extracts the fields in their natural order in 
the record. NOTE: it is not necessary to go through the 

record in this front-to-rear fashion. As we have remarked 
above , fields can be picked out In arbitrary order; to do 
this , the programmer must do some arithmetic over and above 
that displayed in the routine listing. 




RECLEN 



DOC # 



TTLL 



MAUT 



3. The routine stores Individ u ai authors in a PL/1 array 

called AUT. It puts the individual descriptors into the 
array DES and then puts the Primary Descriptor, which has 
a in position 34, into SUB (1) and all other descriptors 

which are access points to the document --these have a "P" 
for Permuted in position 34- -into subsequent positions in 
SUB, Descriptors which are not access points, but merely 
supply the user with further pertinent information about 
the document have no "P" in position 34. (SUB contains 
the major descriptors then, and DES contains all of the 
descriptors.) The Auxiliary variable, NSUB, is formed, 
providing the OUTPUT subroutine with the number of 
elements m SUB , 

ACCESS TO THE API TAPE (UCLA USERS') . 

The API records that ILR possesses are a sample from the total 
API file, they are contained on a single reel of tape (APIJZfjZfl) at 
the Campus Computing Network in Bin #21, A standard request form 
will make the tape available to the user as he needs it. See GCN 
personnel for details. 

In order to access the tape, the following DD cards can be 
used:* 



The tape is 9 -track and has a density of 800 BPI* 

REMARKS 

The "read” routine was developed while CCN had its IBM 360/75, 
but should run without difficulty on the present machine (IBM 360/91) 
The development effort took less than usual (3-4 runs) due to the 
fact that Alan Humphrey of ILR-Berkeley , formerly with the American 
Petroleum Institute 9 was able to provide us with helpful information, 
the essential (written) parts of which can be found filed with 
the ILR copy of the Computer Manual. 

The API file was used, along with MARC I, in some of the 
early experimenting with what might be called the RR approach 

*fhe TAPE IN used in GO.TAPEIN must be defined in the PL/1 program, 
also , with the file attribute . 



//GO.TAPEIN DD 



UNIT=24jerjgf, 

DISF-OLD, 

VOLUME=SER=API0£U , 

LABEL* (,BLP, , IN) , 

DCB= (RECFM=U ,BLKSIZE=45j2fj2Q 



// 

// 

// 

// 



("read” routine) to programming an MM (multiple-master-file) 

"library" of data bases. The RR approach makes use of common 
sub-routines for search and output along with a "read" routine 
which is specific to each file, in the attempt to handle a 
variety of different master file systems (as is planned for a 
Center for Information Services.) The few programs run against 
the API file , available in the tape documentation file of the ILR, 
center around the outputting of "hit" records in a 3 x 5 card 
format. (This output routine presented as Appendix A, has been 
put to use not only with the API file but also with the MARC X , 
COMPBNDEX, and ERIC files and is currently under further development.) 
At the time when efforts on API ceased, a sample of records had 
been placed on direct access for further experimenting. Needless 
to say, much more work can still be done profitably with the 
sample API records (e.g., investigation of "roles" and "links", 
etc.) . Since the API file is a likely candidate for the eventu al 
Center for Information Services data library, experience with 
it takes on added importance. 



LISTING OF READ-ROUTINE* 



LIST2: PROG OPTIONS (MAIN) ; 

DCL AUT (10) CHAR ( 36 ) EXTERNAL , 
OUTPUT ENTRY, 

DES (100) CHAR (36) , 

MDBS CHAR (3) , 

NDES BINARY, 

NAUT BINARY FIXED EXTERNAL, 
RECL BINARY FIXED EXTERNAL, 
NSUB BINARY FIXED EXTERNAL, 
TTL BINARY FIXED EXTERNAL, 
BIBREFL BINARY FIXED EXTERNAL 
BGN BINARY, 

SUB (10) CHAR (36) EXTERNAL, 
BIBREF CHAR (198) VAR EXTERNAL 
TITLE CHAR (131) VAR EXTERNAL, 
1 RECORD, 

2 REC1 CHAR (1) , 

2 RECLEN CHAR (3) , 

2 D0C_# CHAR (8) , 

2 TTLL CHAR (1) , 

2 MAUT CHAR (2), 

2 BIBREFLEN CHAR(l), 

2 REC 2 CHAR (2) , 

2 REC CHAR (45 00), 

TAPE IN FILE RECORD; 

ON ENDFILE (TAPE IN) GO TO END; 

PUT PAGE; 

/* API TAPE */ 

OPEN FILE (SYSPRINT) PAGES IZE (58) ; 

READ__IN ; 

REC = (4500) » 1 ; 

L> 2; 

AUT . = * T ; 

NAUT = 0; 

RECL = 0; 

TTL - 0; 

BIBREFL = 0; 

SUB = * *; 

BIBREF - f 1 ; 

TITLE — T * S 

READ FILE (TAPE IN) INTO (RECORD); 
MDES « REC1 H REC2; 

■Z NDES = MDBS ; 

RECL RECLEN ; 



LISTING OF READ- ROUTINE* (Continued) 
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TTL - TTLL ; 

TTL “ S6*TTL 
NAUT =■ MAUT* 

BIBREFL = BIBREFLEN; 

BIBREFL * 66*BIBREFL; 

TITLE s SUBSTR(REC,1,TTL) ; 

BGN s TTL + 1; 

DO 1=1 TO NAUT* 

AUT(I) " SUBS TR (RE C, BGN, 3 6) * 

BGN = BGN +30; 

END; 

BIBREF * SUBSTR (RE C, BGN, BIBREFL) ; 

BGN * BGN + BIBREFL; 

DO J = I TO NDES; 

DES(J) = SUBSTR(REC,BGN,36) ; 

BGN = BGN + 36; 

IF SUBSTR (DBS (J) ,34,1) = f * f TiffiN DO; 
SUB (1) = SUBSTR (DES(j) ,1,33) ; 

NSUB = NSUB + lr, 

END; 

IF SUBSTR (DES (J) ,34,1) = *P f THEN DO; 
SUB (L) * SUBSTR(DES(J) ,1,33) ; 

L — L + 1 1 
NSUB “ NSUB + 1; 

END; 

END; 

PUT EDIT (AUT) (SKIP,A) ; 

PUT I 'IT (SUB) (SKIP, A) ; 

PUT ED IT (TITLE) (SKIP, A) ; 

PUT EDIT (BIBREF) (SKIP, A) ; 

PUT PAGE; 

CALL OUTPUT : 

GO TO READ_IN; 

END; END LISTS ; 



* The listing of OUTPUT is not given here, (It was a pre- compiled 
(object) routine in the present run.) Its output also is not 

••^’’/listed. "••.T ’ - 




OUTPUT (FROM MAIN PROGRAM)* 

(orao record — — 4uinofci;J>ocl) 



GREAT CANADIAN OIL SANDS LTD 
BRECKENRIDGE R M 



Authors 



TAR SAND 
ATHABASCA AREA 



Major Subjects 



PROGRESS REPORT. . .ATHABASCA TAR SANDS } Title 

WORLD PETROL V37 N, 13 .51, 54-56 %DEC 1966< } Bibliographic 

Reference 



;s\ 




*The descriptors in the array DES were (unfortunately) not 
asked for in the program and accordingly do not appear. 




APPENDIX B 



U, S, CENSUS OP HOUSEHOLDS (I960) - 
1/100 Q SAMPLE. 

"READ” ROUTINE AND OTHER INFORMATION 
(K. D. Reilly. December' 1969) 



BACKGROUND 



The 1960 Census of Households (1/1000 Sample of Households) 
file is available in its entirety at UCLA. It Is at the Campus 
Computing Network and is available to any user; it is obtained 
by asking for tape CCN110. The Census of Households is only a 
very small part of the overall outpouring of data from the U. S. 
Department of Commerce, Bureau of the Census. Essentially, what 
the Census takers do every ten years is to ask every person that 
they can find certain basic questions; in order to comply with 
the laws of confidentiality, it is summary tabulations of these 
data that form the bulk of the "Census" data. In addition to the 
basic data collection there is the sample effort. In 1960, some 
20% of households were asked one set of questions; a different 5% 
were asked a somewhat (though not extensively) different set of 
questions. These households samples, for obvious reasons, are 
referred to as the 20 % and 5 % samples. For those questions which 
are asked of both 5% and 20% of households there is a 25% sample. 
(Note that the percentages changed in 1970; some new questions 
were added and old ones deleted, but the file remains substanti al ly 
the same in content for the two periods.) The 1/1000 sample is a 
stratified sample extracted from the 25% sample; therefore, it 
contains both 5% and 20% records. The contents of the file can 
be ascertained in Figure 1, which provides the data item names, 
and the status of each item (whether asked of everyone, a 5% 
sample, etc.) for both 1960 and 1970. Besides what is exhibited 
below, each record has a serial number, a region indicator, etc. , 
as can be seen in the F,r/1 structure of Figure 2 .* 

— - . : 

From this point on all references are to the 1960 file and may 
not be accurate for the 1970 file. , 







FIGURE 1 



DATA ITEMS OF U. S. CENSUS 



Complete-Count or 

Popula tion Items Sample Percentage 

I960 1970 



Relationship to head of household 

Color or race 

Age (month and year of birth) . , 

S ex »»••»••••••*••• 

Marital status . . * 



100 


100 


100 


100 


100 


100 


100 


100 


100 


100 



State or country of birth . . 

Years of school completed ... 
Number of children ever born 
Activity 5 years ago ..... 

Employment status ....... 

Hours worked last week .... 

Weeks worked last year .... 

Last year in which worked ... 
Occupation, industry, and class 
Income last year: 

Wage and salary income . . 

Self-employment income . . 
Other income ....... 

Country of birth of parents . . 

Mother tongue .. 

Year moved into this house . . 
Place of residence 5 years ago 
School or college enrollment (pi 
Veteran status ........ 

Place of work . . . ... • . . 

Means of transportation to work 





25 


20 


25 


20 


25 


20 


• 9 , 


20 


25 


20 


25 


20 


25 


20 


25 


20 


25 


20 


25 


20 


25 


20 


25 


20 


25 


IS 


25 


15 


25 


15 


25 


15 


25 


15 


25 


15 


25 


15 


25 


15 



Occupation- Indus try 5 years ago ...... 

Citizenship . , . ... ... • ... - * * 

Year of immigration . .. .... .. - • • 

Marital history . . . . ... • » • « * * • 

Vocational training complete .... . . . 

Presence and duration of disability . . . . 



5 

5 

5 

25 5 

5 
5 








FIGURE 1 (Continued) 



Housing Items 



Complete -Count or 
Sample Percentage 

I960 1970 



Number of units at this address - - 

Telephone - * - 

Access to unit - 

Kitchen or cooking facilities - 
Complete kitchen facilities . , , . 

Condition of housing unit 

Rooms 

Water supply • * 

Flush toilet 

Bathtub or shower ......... 

Basement .... 

Tenure ...... 

Commercial establishment on property 

Value 

Contract rent 

Vacancy status 

Months vacant - - 



Components of gross rent ....... 

Heating equipment . . . . . . . • 

Year structure build , . « * - » * ♦ * 

Number of units in structure . . * 

Whether a trailer , , ... . . • • * • - * * * * 

Farm residence (acreage and sales of farm 

products) ...... . • * » * • - * * * « * 

Land used for farming . • ... .... . . • « « 



Source of water . . ... . . . ... « • * • * * * * * 

Sewage disposal . . . . « * . * • • • * • * * • * * 

Bathrooms ... . . • • • • * * » * * - * * * * 

Air conditioning . . . - .... « * - * * . . . 

Automobiles . . .... . . • • • • • • * ■ * * * 

Stories, elevator in structure . - - . * « » * * * 
Fuel- -heating, cooking, : water heating .. . • • • • 
Bedrooms . . . ... ..... . . • « • • • • • • • 

Second home . v - . * » * > » * * « » » * * * 

Clothes washing machine ;. •» ... . » • * • * ■ 

Clothes dryer ';t. . . . . . . . • ; • • • • • *, ■ ■ 

■ Dishwasher . ■ . /• - * * * * • * * * 

Home food freezer / . . . . - - 

^Television V.-'i .• • • • • » * ■ • * ♦ * * * * 

o . ^ W • - • ■ * • • • ■'* "• * * * •' • = • * * 9 



- 


100 


25 


100 


100 


100 


100 


100 


- 


100 


100 


- 


100 


100 


100 


100 


100 


100 


100 


100 


20 


100 


100 


100 


100 


100 


100 


100 


100 


100 


100 


100 


25 


100 


25 


20 


25 


20 


25 


20 


20 


20 


25 


20 


25 


20 


25 


- 


20 


IS 


20 


15 


20 


15 


5 


15 


20 


15 


20 


5 


5 


5 


5 .. 


5 




5 


s 


5 


: /: -5 :..- 


5 


' — ..y l -■ 


5 




5 


:K% 


5 


5 . 


5 
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Besides the basic documentation provided by Census Bureau the 
tape documentation file contains a Master T s thesis by K. Pearson, 
PROVIDING FOR MACHINE -READABLE STATISTICAL DATA SETS IN UNIVERSITY 
RESEARCH LIBRARIES, published as Report SP-3155/000/00 by Systems 
Development Corporation* This work cover;, a number of issues of 
much broader scope than what is discussed in this report; its 
main interest here is that it contains an example program using 
the U. S . Census of Households (1/1000 Sample) file, A basic 
data, quality control routine** checks "common universe” totals, 

RECORD FORMAT 

Each item or field in the U, S. Census of Households (1/1000 
Sample) file is fixed in length. Most of the fields are single - 
character (0-9; V or X) codes, although a few, e.g,, wage and 
salary information, are multiple character. The record size is 
a total of 120 characters. The distinction between the 5% and 
20% record is done by simple check of the content of a single 
field located at the 113th position in the record. This field is 
blank for the 20% sample. 

READ ROUTINE 

The purpose of a "read” routine for a file is to "expose” 
some or all of the fields of each record so that such (field) 
data can be manipulated (scanned, compared to other data, printed 
out, etc.) in subroutines used in conjunction with the "read” 
routine. The concept of "read” routine for a fixed-field record, 
in general, consists merely of devising the structure (s) necessary 
to provide ’names' to the field data, and thus is little more than 
a section of code that can be placed away on a disk and called 
(say by the PL/1 preprocessor command, INCLUDE) in any particular 
program for which the structure (s) is (are) needed. This is the 
approach discussed in the cited work of K. Pearson. Pearson’s 
"included” text also has a means of separating the 5% and 20% 
samples and of providing for conversion of certain character-string 
fields to numerical form. No provision comparable to Pearson’s 
conversion subroutine is presented in this report. The approach 
taken to separate the two samples here makes use of PL/1 pointer 
variables to ’’overlay” a portion of core (in the buffer region) 
with two different structures, one for the 5% records and the other 
for the 20% record. These structures at the top level (see Figure 2) 
are : / 



A listing of this routine, written by A. deBoer of the CIS staff, 
is in the XLR tape documentation file. The results obtained 
from running this program and a complete paper listing of the 

entire file are in-house at CCN. 

O 
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FIGURE 2 

STRUCTURES FOR 5% AND 20% SAMPLE RECORDS OF 
U. 5, CENSUS OF HOUSEHOLDS FILE 
(1/1000 Sample) 9 I960 



DCL 1 FIVE BASED 

2 5ERIALNR 
2 REGION 
2 PLACES IZE 
2 3 MSA 
2 URBANS IZE 
2 AGE 

2 B IRTHQRTR 
2 S EX_MRG_QRTR 
2 YR__iS T_MGR 
2 MARITALSTATU8 
2 RELATION 
2 FAMILYNR 
2 FAMILYTYPE 
2 RACE 
2 NATIVITY 
2 PARENTORIGIN 
2 REC ODE_PAR_OR IG IN 
2 POB__FOREIGN 
2 POB_RECODE 
2 MOTHERTQNGUE 
2 MOMTQNGUERECOBE 
2 YEARMOVED 
2 SAME_SM3A_X955 
2 M3TR0195S 
2 HIGHE3TGRADE 
2 SCHOOL 
2 EMPLOYS TATUS 
2 HOURS 
2 OCCUPATION 
2 INDUSTRY 
2 WORKERCLASS 
2 WORKPLACE 
2 TRANSPORT 
2 WEEKS WORKED 5 9 
2 CHILDRENS ORN 
2 T0TAL59EARN 
2 WAGES 59 
2 SELF59 
2 OTHERS 9 
2 TOTALS 9 INCOME 



(PT) , 

CHAR (5) , 

CHAR (1) 5 
CHAR (1) , 

CHAR (1) t 
CHAR (1) , 

CHAR (2) , /* 10 - 11 */ 

CHAR (1) , 

CHAR (1) , 

CHAR (2) , 

CHAR (1) , 

CHAR (1) , 

CHAR (1) , 

CHAR (1) , 

CHAR (1) „ /* 20 */ 

CHAR (1) , 

CHAR (2) } 

CHAR (1), 

CHAR (2) , 

CHAR (1), 

CHAR (2) , 

CHAR (1) 9 /* 30 */ 

CHAR (1), 

CHAR (1) s 
CHAR (1), 

CHAR (1) , 

CHAR (1) 9 
CHAR (1) , 

CHAR (1) 9 

CHAR (3) , /* 41 - 43 */ 

CHAR (3) , 

CHAR (1), 

CHAR (1) , 

CHAR (1) , 

CHAR (1) , 

CHAR (1) 9 
CHAR (1) , 



CHAR 


( 3 ) 9 


/* 


50 - S2 


*/ 


CHAR 


( 3 ), 








CHAR 


( 3 ) 9 








CHAR 


( 3 ) 9 


/* 


59 - 61 


*/ 
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FIGURE 2 (Continued) 



O 

ERIC 



2 SOCIOSTATUS 


CHAR 


CD 


2 SOCIOCONSTANT 


CHAR 


(1) 


2 HOUSEHOLDTYPE 


CHAR 


(1) 


2 HOUSEHOLDTOTAL 


CHAR 


CD 


2 NR NONRELATIVE 


CHAR 


(1) 


2 NR IN FAMILY 


CHAR 


CD 


2 NR REL UNDER18 


CHAR 


CD , 


2 AGE UNDER18 


CHAR 


Cl) , 


2 NR 65 


CHAR 


CD , 


2 NR LABOR 


CHAR 


Cl) ! 


2 NR EMPLOYED 


CHAR 


Cl) ! 


2 NR UNEMPLOYED 


CHAR 


Cl) ■ 


2 FAM20LAB0RSTAT 


CHAR 


(1) = 


2 RELATEDEARNERS 


CHAR 


CD . 


2 FAMILY59T0TALING 


CHAR 


(3) ; 


2 FAM59INC 


CHAR 


Cl) , 


2 OWNCHILD UNDER3 


CHAR 


Cl) a 


2 OWNCHILD 3 TOM* 


CHAR 


Cl) , 


2 OWNCHILD 5 


CHAR 


CD , 


2 OWNCHILD 6 


CHAR 


CD , 


2 OWNCHILD 6 TOll 


CHAR 


CD, 


2 OWNCHILD 12T017 


CHAR 


CD, 


2 OWNS INGLE UNDER18 


CHAR 


CD, 


2 OWNS INGLE 18PLUS 


CHAR 


CD, 


2 OWN GRADE 1T08 


CHAR 


CD, 


2 AGE YOUNGEST 


CHAR 


CD, 


2 AP AGE 


CHAR 


(2), 


2 AP SEX MRGSTATUS 


CHAR 


Cl) , 


2 AP RACE 


CHAR 


CD, 


2 AP HIGHESTGRADE 


CHAR 


Cl) , 


2 AP EMPLOYS TATUS 


CHAR 


CD, 


2 AP HRSLASTWK 


CHAR 


CD, 


2 AP OCCUPATION 


CHAR 


CD, 


2 AP CHILDBORN 


CHAR 


CD, 


2 AP TOTALS 9 INC 


CHAR 


CD, 


2 MOM HIGHESTGRADE 


CHAR 


CD , 


2 MOM LABORSTATUS 


CHAR 


CD, 


2 TENURE 


CHAR 


CD, 


2 KITCHENPHONE 


CHAR 


Cl) , 


2 ROOMS 


CHAR 


CD, 


2 PERSONS PER ROOM 


CHAR 


CD, 


2 BATH 


CHAR 


CD, 


2 WATER 


CHAR 


CD, 


2 TOILET 


CHAR 


CD, 


2 BATHING 


CHAR 


CD, 


2 BUILT 


CHAR 


CD, 


2 HEATING 


CHAR 


CD, 


2 AUTOS 


CHAR 


CD, 



/* 70 */ 



/* 80 */ 



/* 90 - 91 */ 



/* 100 */ 



/* 110 */ 
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FIGURE 



DCL X 



/* 



O 

ERLC 



2 (Continued) 



2 CONTKACT_RENT 
2 GROSS_RENT 
2 VALUE__RATIO 
2 BEDROOMS 
2 FUEL 

2 WAS HERDRYER 
2 RADIOTV 
2 AIRCONJFREEZER 
TWENTY 
2 FILLER1 
2 POB_US 
2 FILLER2 
2 YEARLAS TWORICED 
2 FILLERS 
2 VETERAN 
2 FILLBR4 
2 QUARTERS TYPE 
2 FILLERS 

2 TOTALS 9 INC_NOTFAM 
2 FILLERS 
2 AP_YRLA3 TWORKED 
2 FILLER7 
2 AP_VETERAN 
2 FILLERS 
2 PROPERTYVALUE 
2 RENT_RATIO 
2 BATHROOMS 
2 BLANKFOR2 OPC 
2 STRUCTURETYPE 
2 STORIES 
2 SEWAGE 
2 BASEMENT 



CHAR (1) , 
CHAR (1) , 
CHAR (1) , 
CHAR (l) s 
CHAR (1) , 
CHAR (1) , 
CHAR (1), 
CHAR (1)| 

BASED (PT) , 

CHAR (26) , 
CHAR (1), 
CHAR (9) , 
CHAR (1) , 
CHAR (10), 
CHAR (1) , 
CHAR (15), 
CHAR (1), 
CHAR (14) , 
CHAR (1), 
CHAR (16), 
CHAR (1), 
CHAR (1), 
CHAR (1), 
CHAR (15) , 
CHAR (1) , 
CHAR (1), 
CHAR (1), 
CHAR (1), 
CHAR (1), 
CHAR (1) , 
CHAR (1) , 
CHAR (1) ; 



Note: 



The structure, TWENTY, has not been utilized other 
than for simple listings and there may be some 
inaccurate and/or misleading name usages in the latter 



portions, e.g., around STORIES where some sharing of 
fields may occur. Refer to the original documentation 
for clarification on this matter. 
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FIVE BASED (PT) , 

TWENTY BASED (PT) . 

Note that all the field names of both structures are available 
to the programmer to manipulate. Thus POB_US and FOB__RECODE are 
one and the same piece of data, but represent different concepts 
depending on whether the particular record is from the S% or the 
20% sample. One PL/1 READ command suffices to "fill" both 
structures at the same time; the statement appears as: 

READ FILE (DS INPUT) SET (PT) ; 

SPECIAL COMMENTS 

There are possible further developments (currently under 
investigation) of the "read" routine concept as it might be applied 
to the Household Sample data file. In further development of the 
idea it must be taken into account that individu al records in the 
file often cannot be used alone. This is because the records of 
all the members of a single household follow one another in sequence. 
Since the serial number changes when a new household arises in 
the file , a "read" routine can be constructed so that for certain 
studies the second,' third, etc,, records for a given family can 
be "thrown" away (o . g. , a study in which the extent of association 
between the educataon level of the head of the household and the 
number of automobiles is sought) , Of course, this is but one 
possible "logical" division of family records and on other occasions, 
it may be that the wife or the children or members of the "sub- 
family" are under investigation. 

Still another vagary in these data that the ideal "read" 
routine must cope with is that of "shared" field locations. The 
field HOURS gives the number of hours worked in the previous week 
(previous to the Census taker's call upon the household) for 
working individuals, whereas it gives the last year worked for 
retired and unemployed individuals. The "tip-off" for which 
meaning this field takes on in a particular context is the value 
of the field EMPLOYSTATUS , the field just in front of HOURS in 
the structure. Note in passing that there are still other pecu- 
liarities that must be taken into account, e,g,, the special 
treatment needed for handling the automobile ownership field. It 
would appear that incorporation of these complexities into a "read" 
routine is not impossible by any means but is not necessarily a 
straightforward matter. 



ACCESS TO THE CENSUS TAPE fUCLA User's) 

The I960 Census of Households (1/1000 Sample) tape 
available in its entirety on a single reel of 800 BPI, 

O 

ERIC 
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tape at CGN. It is not in the so-called User's Bin; it is on 
the GCN's own tape rack and may be retrieved from there by using 
the customary request sheet and merely asking for tape #CCN11Q. 

See GCN personnel for details. 

The Job Control Language for the tape is as follows:* 

//GO.DSINPUT DD VOL^SER-GCNll#, 

// DISP* (OLD , KEE P) , 

// DSN— CENSU360, 

// LABELS (1 , SL 5 , IN) , 

// DCB= (RECFM=FB , LRE CL=1 20 , BLKS I ZE*1 2 000 ) , 

// UNIT“2400 

REMARKS ON THE USE OF THE PILE 

Census data are very popular. They form the basis for a 
multitude of research projects in their own right. A paragraph 
of review on some of these projects is found on page 47 of the 
report by K. Reilly, "Nature of Typical Data Bases", Part 5 of 
the final report on "Mechanized Information Services in the 
University Library - Phase I; Planning", published by ILR. 

Census data formed the basis for a large-scale study in 
New Haven, Connecticut. A number (so far eleven) of documents 
published by the U. S. Bureau of Census describe this study; 
collectively these documents bear the name CENSUS USE STUDY and 
individual reports include : General Description, Computer Mapping, 

Data Tabulation Activities, the DIME Geocoding System, Data 
Interests of Local Agencies, Family Health Survey, Health Information 
System, Data Uses in Health Planning, Data Uses in Urban Planning, 
Data Uses in School Administration, Area Travel Study. In 
addition to these activities, the New Haven study led to the 
development of "computer programming packages" for computer 
graphing and for merging of local data (differently organized 
data, i.e. , different keys or serial numbers, etc.) with Census 
’data to form a more comprehensive data base. These are, by name: 
DIME, a program package for creating the file that relates local 
data file keys or serial numbers to geographical keys or codes 
(e.g.. Census tracts); GRIDS, a program package for use on small- 
scale computers for mapping of data, with "shading" capabilities, 
etc,; ADMATCHj. a program package for assigning geographical codes 
to local records. Continuation of exploratory studies on the use 
of Census data is underway in Los Angeles. Part of this work is 



*The PL/1 program must have a DECLARE command In it for a 
(Input) with the name DS INPUT, 

me si 



being done here at UCLA, The "partners" are ILR (informally ) 9 
the Engineering Department , and SCRIS (Southern California 
Regional Information Study) , 

The work to date that has been performed here at ILR has been 
primarily exploratory. As mentioned above, the file was reviewed 
in the study of Reilly and was involved in the exercise portion 
of the Pearson master’s thesis. Pearson searched the file for 
librarians and librarian aids and printed out in tabular form some 
Information (e.g. , age breakdown, salary, portion of the country, 
etc.) about them. In addition to this, a portion of the records 
were placed on direct access, in part for purposes of random 
sampling from the file. There was a beginning of a study utilizing 
the IBM SSP (Scientific Subroutine Package) statistical programs 
in relation to the file within the scope of the "Center for 
Information Services" project. The first program in this effort 
was a simple Chi-square "association" test for education vs. number 
of automobiles. Information on these studies is avaiialbe in the 
ILR tape documentation file . 
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APPENDIX C 



READ ROUTINE FOR THE CBB FILE 



(G . Silva , 

I, Identification 
Title 

User Group 
Category 
Source Language 
Machine Configuration 
Space Requirements 

Author 

Date 







August 1969) 

S 

READMT 

Institute of Library Research 

CIS Text Processing 

PL/1 

IBM 360/91 

Step Region: Core Used: 

PL/1 158K 72K 

LKED 15 IK 
GO 15QK 

i 

G, Silva 
August 1969 
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READ ROUTINE FOR THE CBB FILE 



II » Purpose 



1* To read magnetic tape records of the Communications of 

Behavioral Biology file. These records contain bibliographical 
information described under "Record Format" (see section IV) . 

2. To identify each field within a record. 




READ ROUTINE FOR THE CBB FILE 



III. Tape Specifications 

1. 9 Track Tape 

2. 800 BPI 

3. Standard Label 

4. ' : Dne File 

5. Volume Serial Number is XLR 005 

6. The Data Set name is CBBCCF 

7 . Record Format is Variable (V) 

8. The Block Size is 7200 

9. The Code is BCDIC 

The DD Card used to read the tape is: 

//GO.CBBR DD DS NAME-CBBCC F 9 UN1T«TAPE9 , VOLUME-8 ER=ILR0 0 5 9 

// DCB= (RECFM=V , BLKS IZE=7 200) ,BI8P=0LB ,LABBL-(3 # 8L # S IN) 

Note: ‘ ' * 

The subparameter IN (Fourth subparameter of the LABEL 
parameter) s tells the system that this tape is to be 
used as input only and eliminates the operator/ computer 
dialogue on write rings. 
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READ ROUTINE FOR THE CBB FILE 



IV. File Description 



The file contains variable length records 



Record Format 



Records consist of fixed header information followed by a 
variable length field. 



Header Informat 5.on: 



Citation Number 
Something 

Number of Journal Title Words 

Year Published 

Type of Publication 

Date Merged into Master File 

Number of Descriptors 

Number of Authors 

Language Code 

Number of Title Words 

Number of Page Words 

Number of Date-Publ Words 

Number of Vernacular Words 

Something 

Number of Abstracts 

File Source Code 

Author Type Code 

Combined Citation Search Code 

Journal Title Code 

Starting Page 

Something 

Numb a r of Address Words 
Number of Abstract Words 
Reserved 



O 
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„p 



Field Length 


PL/1 Attribu 


5 bytes 


FIXED (9) 


1 byte 


BIT (8) 5 


2 bytes 


BIT (16) 


2 bytes 


FIXED (3) 


2 bytes 


BIT (16) 


6 bytes 


CHAR (6) 


2 bytes 


BIT (16) 


2 bytes 


BIT (16) 


2 bytes 


FIXED (3) 


2 bytes 


BIT (16) 


2 bytes 


BIT (16) 


2 bytes 


BIT (16) 


2 bytes 


BIT (16) 


4 bytes 


CHAR (4) 


2 bytes 


BIT (16) 


1 byte 


FIXED (1) 


1 byte 


FIXED (1) 


1 byte 


FIXED (1) 


4 bytes 


CHAR (4) 


6 bytes 


CHAR (6) 


1 byte 


CHAR(l) 


2 bytes 


BIT (16) 


2 bytes 


BIT (16) 


2 bytes 


CHAR (2)„ 



or CHAR (1} 
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READ ROUTINE FOR THE CBB FILE 



IV . Fi le Description (Continued) 



Variable Portion 



Field Length 



PL/1 Attributes 



Authors : 

CHAR (4) 
FIXED (1) 
FIXED (1) 
CHAR (36) 
CHAR (36) 

78 bytes 



Reserved 

No , of Print Words 
No. of Sort Words 
Sort Author 
Print Author 



4 bytes 
1 byte 
1 byte 
36 bytes 
36 bytes 



Title 

Journal Title Abbreviation 
Pages 

Publication Date 
Vernacular Title 
Address 
Abstract 

Descriptor Terms 



6 x No. of title words 
6 x No. of jour, title wds 
6 x No. of page words 
6 x No. of date pub. wds 
6 x No. of vern. words 
6 x No, of addr. words 
6 x (No . of abstr) x (No , 
of abstr wds) 

48 x No. of descript. 



CHAR (600) VAR 
CHAR (60) VAR 
CHAR (30) VAR 
CHAR (18) VAR 
CHAR (480) VAR 
CHAR (120) VAR 

CHAR (3000) VAR 
CHAR (1000) VAR 
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READ ROUTINE FOR THE CBB FILE 



V, Checkout 



1. Timing 

The program processed 100 logical records with the following 
results: 

Step Name Step CPU Seconds 

PL1L 2.82s 



LKED 



.55s 



GO 



2.24s 



Average processing time: 



22,4 millisecs/log.rec 



Note that if printing of data elements is bypassed, the 
processing time is .26 s/100 log. re-cs, or 2.6 millisecs/log.rec. 
This is of the order of 2 mins/50, OOOrecords- 



2 . Error Checking 



None 



3 . ,Cost 



Opening the CBB file was closely interwoven with the development 
and testing of a much larger text processing system. I am 
therefore unable to separate the cost of opening the file from 
the rest of the processing. 
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READ ROUTINE FOR THE CBB FILE 



VI. Program Description 

1 . Pr eprocessor variables : 



REC structure name for header information 


CBBJRECORD 


REC_SIZE length of variable part of record 


7000 Bytes 


LABSTR length of abstract field 


3000 Bytes 


MAX_REC_NO the maximum number of records 
to be processed in any run 


100 



2 . Processor variables (an order of occurrence in deck - ) 

CBB_RECORD structure name for entire record 

The variable names in the structure CBB_RECORD are self 
explanatory. * 

( 

KTR counts the number of records processed. / 



READ ROUTINE FOR THE CBB FILE 



3 . Flowchart 
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y/NBBUOSWT JOB • 20 , 2500 • ,SILVA 

Vft ' EXEC PL 1 LFCLG ,PARM= , M,SH=(2,72,1) ,ST * , PG=200 , RG= 1 50K 
>/ PL1L.SYSIN EC * 

TESTCBB: 

PROCEDURE OPTIONS (MAIN) ; 

% DCL ITEST FIXED; 

% IT EST= 1 ; 

% ITEST=0 ; 

* ECL FEC CHAR; 

% REC= ' CBB_RECOBD * ; 

* ECL BEC_SIZE FIXED; 

* REC_ SI ZE = 7000; 

% CCL LAESTR EIXED; 

% LABSTB =3000; 

* CCL MAX_REC_NO FIXED; 

% MA X_FEC_NG =100; 

/ ECL 1 REC EASED (MAIN_P) , 

2 HEADER, 

3 C IT AT ION_NO FIXED (9), 

3 DISREGARD2 BIT (8) , 

3 NC_J_TITL_HDS BIT(16), 

3 YR_PUBL FIXED (3) , 

3 TYPE_PUBL BIT (16) , 

3 MERGE_DATE CHAR (6), 

3 NC_DESCRIPT BIT(16), 

3 NO_ADTHORS BIT (16), 

3 LANG_CODE FIXED (3), 

3 NC_TITL_WDS BIT (16), 

3 NO_PG_WDS BIT (16), 

3 NO_DATE_PUBL_W OS BIT (16) , 

3 NC._VERN_WD‘> BIT{16), 

3 DISREGARDS CHAR (4), 

3 NC._ AESTR BIT (16), 

3 FI LE_ SOURCE_CODE FIXED (1), 

3 AUTHOR_TYPE_CODE FIXED(I), 

3 COMB_CIT_SEARCH_CODE FIXED (1) , 

3 J_T ITL_CO BE CHAP.(4), 

3 STARTI NG_ PG CHAP (6), 

3 EISREGARD4 CHAR ( 1) , 

3 NO_ADDR_WDS BIT (16), 

3 NO_ABSTR_WDS BIT(16), 

3 DISREGARDS CHAR (2), 

2 REMAINDER CHAR (REC_SIZE) ; 

DCL 1 AUTHORS BASED (P) , 

2 RES V E CHAR (4), 

2 NO_ PRI NT_ WDS FIXED (1), 

2 NO_SORT_WCS FIXED (1), 

2 SORT_ AUTHOR CHAR (36), 

2 PRI NT_AUT HOB CHAR (36) ; 

DCL TITLE CHAR (600) VAR; 

ECL J_T IT L_ A E B R CHAR (60) VAR { 

DCL PAGES CHAR (30) VAR; 

DCL PUBL_DATE CHAR (18) VAR; 

DCL VERNACULAR__TITL CHAR (480) VAB; 

DCL AEDRESS CHAR ( 1 20) VAR ; 

DCL ABSTRACT CHAR (LABSTR) VAR; 

DCL DES CR_T ERMS CHAR ( 1000) VAR ; 

DCL PROXY FIXED BI N (31) BASED (PROXY_PTR) ; 

DCL I FIXED BIN; 

ERIC 
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ON ENDFI1E (CBBH) GO TO FINIS; 

P ROX Y_ PT R= ADEB (P) ; 

K TB=C ; 

NEXTREC: 

< K TR = KTR+ 1 ; 

IP KTR> MAX_REC_NO THE;, GO TO FINIS; 

READ FILE (CBBR) SET (MAIN P) ; 

% IF IT EST=0 % THEN % GO TO IT20; 

/* PRINT OUT HEADER INFORMATION ♦/ 

PUT PAGE; 

P UT SKIP LIST { 'CITATION_NO' ,CIT ATION_NO) ; 

PUT SKIP LIST (• DISREGARD2' , DISBEGARD2) ; 

PUT SKIP LIST ( 'NC_ J_TITL_WDS ' ,NO_J_TITL_WDS) ; 

PUT ■'<IP LIST ( ' Y R_PUBL ' , YR_PUBL) ; 

PUT SKIP LIST { ' T YPE_PUBL ' ,T YPE_PUEL) ; 

PUT SKIP LIST (» MERGE_DATE' ,MERGE_DATE) ; 

PUT SKIP LIST ('NO_DESCRIPT',NO_ DESCRIPT) ; 

PUT SKIP LIST (' NO_ AUTHORS' ,N0_AUT30RS) ; 

PUT SKIP LIST ( • L ANG_CODE - , L ANG_COEE) ; 

PUT SKIP LIST (• NC_TITL_, WCS' , NO~TITL_B DS) ; 

PUT SKIP LIST ( 'NO_PG_WDS ',NO_PG_WDS) ; 

PUT SKIP LIST (• NC_DATE_PUBL_WDS* , NO_C ATE_PUBL_H DS) ; 

PUT SKIP LIST ( 'NO_YERN_WDS',NO_VERN_WDS) ; 

PUT SKIP LIST (' DISREGARD3 ' ,DXSREGARD3) ; 

PUT SKIP LIST ( ' NO_ ABSTR ' , NO^ABSTR) ; 

PUT SKIP LIST ( 'FILE_SOURCE_CODE' ,FILE_SOURCE CODE); 

PUT SKIP L 1ST ( ' AUTHOR_TYPE_CODE* , AUTHOR_T¥PE~CO,DE) ; 

PUT SKIP LIST ( •COMB_CIT_SE^RCH_COEE' , COMB_CIT SB ARCH_CODE) ; 
PUT SKIP LIST (• J_TITL_CODE' , J__TITL_CODE) ;“ 

PUT SKIP LIST ('STARTING_PG', STARTING PG) ; 

PUT SKIP LIST (• DISREGARDS, DISREGARDS) ; 

PUT SKIP LIST ('NC_ADDR_»DS»,NO_AECR_iiDS) ; 

PUT SKIP LIST ('NO_ABSTR_WDS* , NO_ABSTR_ WD S) ; 

PUT SKIP LIST ('DISREGARDS*, DISREGARDS) ; 

% IT20 ; ; 

P = ADDR (REMAINDER) ; 

DO 1=1 TO NO_ AUT HORS ; 

% IF ITEST = 0 % THEN 51 GO TO IT30; 

PUT SKIP LIST (' AUTHORS* , AUTHORS) ; 

56 IT3C: ; 

P ROX Y= PROXY +7 8 ; 

END; 

I = NO_ AUT HORS *7 8 +1; 

NO_TITL_CHAR = 6 *NO_TITL_ MDS ; 

TITLE = SU EST R ( R EM AIN DER, I, NO_TITL CHAR); 

1=1 + NO_TITL_CH AR ; 

NO_J_T IT L_ CHAR = 6*NO_J_TITL_HDS ; 

J_TITL_ABBR = SUBSTR( REMAINDER, I , NO_ J_T ITL_CHAR) ; 

1=1+ NO_J_TITL_CHAR ; 

NO_PG_CHAR = 6* NC_PG_WDS; 

PAGES = SUBSTR ( REMAINDER, I, NO_PG CHAR); 

1=1 + NC_PG_CHAR; 

NO_DATE_PUBL_CHAR = 6* NO_DATE_PUBL_WDS ; 

P UBL_DATE = SUBSTR ( REMAINDER, I, NO_DATE_PUBL_CHAR) ; 

1=1 + NO_EATE_PUBL_CHAB; 

NO_VERN_TITL_CHAR = 6* NC_VERN_WDS ; 

V ERNACUL A R„TITL = SUBSTR ( REMAINDER ,1, NO_ VERN TITL CHAR) ; 
1=1 + NO„VERN_TITL_CHAR; 

NO_AEDR_CHAR = 6* NO_ADDR_8DS; 

ADDRESS = SUBSTR ( REMAINDER, I, NC_ ADDB_CHAR) ; 
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1=1+ NO_ADD?_CHAB; 

NO_ABSTB_CHAR = NO_ABSTR* HO_ AESTB_MDS * 6; 

ABSTBACT = SUBSTB ( REMAINDER , I, NO_ABSTR CHAR) ; 

1=1+ NO_ABSTR_CHAB; 

NC_DESCR_TERHS_CHAR = NO_DESCRIPT * 48; 

DE5CH_TERMS = SUBSTR ( BEHAINBEB.I, NO_DESCR TERflS_C2AR) ; 
t IF IT EST=0 % THEN * GO TO IT50; 

POT SKIP LIST( ‘TITLE' .TITLE) ; 

PUT SKIP LIST ( ■ J_TITL_ABBR» , J_TITL_ABBR) ; 

PUT SKIP LIS1 ( ‘PAGES* .PAGES) ; 

PUT SKIP LIST ( • PUBL_DATE‘ ,PUBL_DATE) ; 

PUT SKIP LIST ( * VEBNACULAB_TITL”' ,V EBNACOL AR_TITL) ; 

PUT SKIP LIST ( 'ADDRESS'. ADDRESS); 

PUT SKIP (2) LIST ('ABSTRACT', ABSTRACT) ; 

PUT SKIP LIST {' DESCr_TEFMS* ,DESCR_TER«S) ; 

% IT50 : ; 

GO TC NEXTREC; 

FINIS : 

END TESTCBB; 

/* 

//GC.CEER DD DSN AKE=CBBCCF , UNIT=TAPE9 , VCI.U ME=S EB= ILBOO 5, X 

•, DCE= (RECFM=V,BLKSIZE=7200) , DISP=OLD, LABEL= (3.SL, .IN) 
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APPENDIX D 



READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FIL E 

(G. Silva. July 1969) 



I . IDENTIFICATION 
Title 

User group 
Category 
Source Language 
Machine configuration 
Space requirements 



Author 

Date 



TEST 



Institute of Library Research 

CIS Text Processing 

PL/1 

IBM 360/91 

Step region: 

PLIL 158K 

GO IS OK 

Core used 8M-K 



G. Silva 
July 1963 
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READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FILE 
II. PURPOSE 

1. To read magnetic tape records of the Computerized Engineering 
Index file. These records contain bibliographical information 
described under "Record Format 1 ' (see Section IV) . 

2. To identify each field within a record. 

3. To provide a detailed printout for each record (header 
information and variable part) . 



O 

ERLC 
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read routine for the computerized engineering index file 



III. TAPE SPECIFICATIONS 



1. 9 Track Tape 

2. 800 BPI 

3. No label. Note : There is a tape mark at the beginning of 

the ,tape, hence first sob parameter of the 
LABEL parameter in the DD statement is 2. 
(See DD card description below) . 

4. One file 

5. Volume serial number is ILR08L 

6 . Data set name is ENGDEX 

7. Record format is U (undefined) 

8. Maximum record length is 8000 bytes 

9. Average record length is 2000 bytes 

The DD card used to read the tape is: 

//GO .XYZ DD DSNAME-ENGDEX ,UNIT=TAPE9 , VOLUME=SER=ILR08 0 , 

// DCB= (RECFM=U ,BLRS IZE=8000) ,DISP=OLD,LABEL=(2 ^LP, ,IN) 

Note. The subparameter IN (fourth subpararneter of the LABEL 
parameter) , tells the system that this tape is to be 
used as input only and eliminated the operator/computer 
dialogue on write rings. 



ERIC 
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READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FILE 



IV. FILE DESCRIPTION 



The file contains variable length unblocked records. 



Record Format 



Records consist of fixed header information followed by a variable 
length field containing one data element. 

Header Information 



Block Length 

Logical Record Length 

ID Number 

Security 

Microfilm 

Sequential Binary ID 
Variable Part 



4 bytes 
4 bytes 
12 bytes 
1 byte 

1 byte 

2 bytes 



Note. The data elements are stored in variable length 
fields. Th' - ' r; elris are made up of one or more 
segments -ength 72 bytes. 

Each segiu usists of: 

1. A demarcation symbol 3 bytes 

2. A text line 69 bytes 



The data elements and corresponding demarcation symbols are: 




Demarcation Symbols 

00K 

ym 

09K 

10K 

15K 

2 OK 
299 

3 OK 
KKK 
40K 
KKK 
6 OK 
61K 
649 
65K 

699 



Data Elements 

Title - 1st line 
2nd - nth line 

Subject Headings, Subheading, El No. 

ID Number 

Cite Number 

Author 

Author 

Citation - 1st line 

Citation - 2nd line to nth line 

Abstract 1st line 

Abstract 2nd line to nth line 

Subject Heading, Subheading 

Sales Codes ") 

Sales Codes ; Unofficial 

Access Words! 

Access Words-' 
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READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FILE 



V . CHECKOUT 



X. Timing 



Ths program processed 20 logical records with the following 
results: 



S tep Name 
PLIL 
GO 

Load timing 
Processing Time 3.72s 

186 milliseconds/logical record 



Step CPU Seconds : 
3.66s 
3.93s 
.21s 



2 . Error Checking 



The only data element checked was the demarcation symbol. 
No ,, illegal ,f symbols were detected. 



3 . Cost 

The tape was opened in 29 runs which took a total of 139.34 
cpu seconds at a total cost of $46.08. 



4, Remarks 

Working with the Engineering Index file has proved somewhat 
unsatisfactory . 

a. The file has a tape mark at the beginning where one 
normally expects to find a label or nothing at all. 

This was not mentioned in the accompanying documenation 
and led to some waste of time and effort. 
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READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FILE 



V. CHECKOUT 



4. Remarks (continued) 

b. An attempt to process 100 records led to the discovery 
of a bad spot at the 26th record. The 91 was unable 




to handle that and the tcpe had to be dumped on the 
printer on the IBM Model 20. More waste of time 
and money. 

c. The data is input in segments of length 72 bytes (see 
section IV File Description). A longer data element, 
such as the abstract, consists of several of these 
segments. Each segment consists of some text with 
the remaining portion filled with blanks. In my 
opinion this is bad for the following reasons: 

i. There is a considerable amount of wastage of tay^ . 

ii. Titles, Citations and Abstracts, have to be edited 
before output, i.e., multiple blanks have to be 
reduced to one blank, to make the output more 
readable which increases the processing time 

cons iderably . 

d. The Subject heading and subheading appear twice: In 

the 09K field accompanied by the El Number, and a 
second time in the 60K field, without the El Number. 

In the first instance the subject heading and the 
subheading are linked by two dashes ( — ) , whereas 

- 7 - 70 
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READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FILE 



V. 




CHECKOUT 

4. Remarks (continued) 

in the second, they are linked by a comma (,) . No 
reason for the above discrepancies is given, nor is 
it clear why field 09fcf is repeated in field 60# (less 
El Number) . 

e. Field 40fef contains the abstract. The text of the 
abstract is always followed by the El Number. This 
feature was not mentioned in the documentation. 

f. Do we assume that El "sales codes" are the same as 
the El "card service category code"' 

g. Access words are labelled "unofficial" in the docu- 
mentation. None were encountered in the records 
processed. What is this field designed to c jntain? 
Will "access words" appear in future? 

h. For neither the Subject Headings nor the Access words 
is a thesaurus provided. Are they available? Do 
they have controlled or uncontrolled vocabularies? 
(suggested by Dr. Reilly) 

i. Note that the COMPENDEX file was copied from ILR050 
on to ILR080 on the Model 20. This was a disasterous 
move, since the Model 20 destroyed all the control 
blocks on the tape. (See Section IV on File Des- 
cription for original record format.) 
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READ ROUTINE FOR THE COMPUTERIZED ENGINEERING INDEX FILE 



VI- PROGRAM DESCRIPTION 



1 r Preprocessor variables ; 



REC 


structure name for header 
Information 


ENGINDX 


REC S IZE 


length of variable part of 
record 


4000 bytes 


FLAGNO 


number of different demarcation 
symbols 


11 


FLAGKAR 


length in characters (bytes) of 
d emar c a t io n symbols 


2 



£rpc es s or v ajr iable s : (In order of occurrence in deck) 

FIXED__PART contains Header information 

contains the variable portion of record 
contains one record 



REMAINDER 

DUMMY 

SWITCH 



FLAG 

LB 



is a label used for returning to fields 
made up to repeating segments 

contains the possible demarcation symbols 

is a label array containing the labels .leading 
to the specific operations required by each field. 



The next ten variables are self-explanatory 

PRINT_CONTROL serves as a temporary location for demarcation 

symbols 



PROXY_PTR 

LINE 

PC 



points to P 

serves as a temporary location for the ’segments T 
as defined under "Record format" 

fixed length character array of length 2 serves 
as temporary location for demarcation symbols . 

counts the number of records processed 



NREC 



HEM) rqdCTE FOR THE COMPUTERIZED ENGINEERING jSLSH 



\ 



D-Q 




I O 

Serjc 




x: 

o 

B 



P*4 
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Print 

record 
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READ ROUTINE FOR TH E COMPUTERIZED ENGINEERING INDEX FILE 
U» T E?ogram listing. 



C.pk'P H. F-T I m r MAC on pn n C FS c - 0° 
MACRO SOHRCHR LTSf'ivr 



4 

C 

6 

7 

8 
o 

10 
11 
12 
1 3 

14 

15 

16 

1 7 
18 
1* 
2C 
21 
22 

2 3 
24 

2 5 
26 
27 
2 8 
2*? 

30 

31 
22 

33 

34 

35 

36 

37 

3 8 
30 

4 0 



TTST : 



PROP nor TOMS ( IN ) ; 

07 HCL i-'CC CHAR; 

d p q - t = N) r; i m n x * ; 

2 f'CL PF r C_SIZEr 
? P‘C_ c i7 r = 4 ”r. ; 
r or l c lA(^ o F : ix r o; 



FLAGNOl T ; 

?> OCL FLAGKAR FIX 80; 
r FL A OK AP^FJ AO NO*? ; 

0 C L 1 RFC 7 A Srf! ( IN_P ) t 
? FI XF DEPART t 

3 I njMH r mar { l 2 ) * 

? SFCU* I TY CH A° ( 1 ) , 

3 f-OCYCFlL^ CHAR { 1 ) , 

3 S G 0._ I 0_ NO P I T( 16) * 

7 REWAINDFF. C HAR ( REC„S I IE ) ; 

OCL n 1. I M ft Y FHAIU KRC_SI ZE) var; 
ocl swi tch labfl; .... . 

DCL FLAG CHAR ( FL AGKAP. ) ; 

nr L L n ( F L AGNO ) L ABF I. INI T ( L l -f L 2 f U3 , L 4 t L 5 , L 6 , L 7 , L 6 f L c 
OCL TITLE CHART 1 4 r > ) V A R ; 

OCL SURJ^HEAO C H A \< ( 70 ) V A R ; ....... 

ocl r. ttcZno CH Ak ( 6Q ) VAR ; 

OCL AUTHORS CHAR ( 14 ' ‘ ) V A R ; 
nCL C I TAT I nu C HAP ( 14v ) VAR ; 

PC L ABSTRACT CHAP (3 3 * ) V A 0 , 

OCL SUBJFCT^HC APING CHART 140 ) VAR; 

PC L SALFS_COnt S ( 3 ) CHAP ( 70 ) VAR ; 

P CL ACC W OS 1 CH AR ( 70 ) V AP ; 

HCL ACCWRS2 CH *R ( 71 > VAR 5 



nCL D RI fc^CON-TROL CHART 2 ) BASED ( P ); 

DCL PROXY r I X F 0 C I N ( 3 1 ) P 4SE.D { RRQXY_P TR ) ; 
DCL l iNF HA*{ 40) HASCC(P); 

OCL RC CH \» ( 3) : 

ON H N D F I L T f X Y Z > GO TP FINIS; 

FLAG = • A V A 09 A 10 A 1 5 A ? C; A 3 0 A 4Q A 6 C A 6 1 A 65 A 6° 1 



rst r F C = *" : 
put p a c ? ; 



h 6 A ti y); 



N FX TP FC : 



L I 



L 5. 



41 

42 

43 

44 

46 
4 6 

47 
4 3 
49 




K = l ; 

NSM r -s=o; 

ptJMM Y- ( RFC 3 I 7 E ) * •; > 

R F An FILD(XYZ) INTO ( DIJNVY ) ; 
NREC^NRRC+l? 

IF NRFC> 20 Tt-KN GO TG Fl'MIS; 

PUT PAGF; 

MAIN_° = &DDR ( GUMMY ) ; 

/* PRINT H F A n F p INFQRMATI OH * / 

PUT SKIP LIST! 1 1 0_N0 1 » T 0_NO ) ; 
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TEST : 



MACRO SOMCC c 2 I ISTT'!<- 



5) 

h 2 

5 3 
5 4 
5 5 

5 6 

6 7 

t: o 

5° 
60 ' 

61 

6 ? 

63 

64 
6^ 
66 
67 
6 8 

6 9 

70 

71 

72 

73 

74 

75 

76 

77 

7 B 
79 



PUT \ T ST ( • srCU* : TY ■ , SKCMP ITY) ; 

PUT SKIP L I S T ( 9 I C* C3 F T L ■* • j M I C P P F I L M ) 
PUT SK TP L I ST < ■' S5C_IO_Nn* ,S50_ID_NU) ; 
5 U P J _H r AP ~ * * ; 

T T TL c “ 1 ■ ; 

AUTHOR 5 = 1 1 ; 
r I TAT T C N =• f ; 

A H s T O : A r T = * • ; 

SUPjrCT^hFArjTun^ 1 * ; 

9 ALF f'nt- 9 = * ' ; 

pi<pyY_PTP - A!'.n»(P|; 
niiMMY- SIJUS T*5 ( nijMMY » 17 ) 
n s A n n R ( O l JM my ) ; “ 



510 

/* 



- I-Dohhy — L-£.NCTH ( j)Oh ft vj^' 



/* 

s?o 

/* 



IDENTIFY f'PJNT CONTROL */ 

PC = * A • I I pp T NT_r r IV T POL : 

I = ( J NDFX < FLAG, PC >+2 > /3 ; 

IF I=? THEN On; 

PUT SKIP L I ST ( * PP INT_CCNTpOL NCT FOU NO IN LIST* , PIT NT_.CC 
PUT SKIP LIST! CtJMMY ) ? 

Go T n S30; 

FN 0 ; ... 

PROXY = PPCXY+3; 

P NOW POINTS TO THF BEGINNING OF THE TEXT LINE */ 

K=K + ~ ; 

go t lb ( I ) ; 



PROXY = PROXY + op; 

P NOW POINTS TO THE DEMARCATION SYMBOL 



*/ 



HO 




K-K+69? 4-Puhky 


8 1 




IF K > long 4hhw*w 


8? 




IF PRI NT_CONTP.CL = 


83 




PR.0XY = PRnXY + 3; 


84 




:<=k + 3; 


85 




GO TO SWITCH; 


86 




FMD ; 


87 




GO Tn SIC; 


88 


L 1 : 




89 




T ITL f- = TITLE I ! * 


90 




SWITCH = Lj ; 


‘ °1 • 




GO TO S?0; 


o? 


L 2 : 




93 




SURJ_HEAC = SURJ_ 


94 




SWITCH = L2 ; 


95 




GO TO S 2 0 ; 


06 


L 3 : 




97 




ID_MG = LINE; 


98 




SWITCH=L3; 


99 




GO TO S2C ; 


ICO 


1.4: 





’) THEN GG TO S30; 
! THEN .0.0 5. 



! I LT NE ; 



O 
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T^ST : 



TCPO 


SOUP c p 


3 i 1 




1C 2 




103 


1 


1 




1C 5 




1C6 




1 07 


L 6 : 


1 C 3 




109 




110* 




in 


l 7: 


112 . 




11 3 




ii4 




115 


La; 


1 16 




117 




1 IB 




1 19 


L o : 


] 20 




121 




12? . . 




123 


L 10 : 


124 




125 




126 


111: 


127 




. .. 12S .. 


S3 0 2 


129 


/* 


3 3C 




! 131 




132 




1 33 




134 




135 




1 36 




1 37 




138 




1 39 




140 




141 




142 




143 




144 




1 45 




1 46 


EDIT 


147 




148 


/* ! 



LIST! V l'i 

r { Tf _*•' '= I. I ; 
GO T 1 S ?.C% 

AMT' in?’ ‘^AUTHORS 
SW IT. H=L 5 ; 

f-n to s z c ; 



•I I ljmz; 



CITATIONS I T AT I ON | 1 • • | I L T NE I 

SWTTCH = L<5 ? 

CD T U S 20 5 

A D . S T P A C T = A B S TR A C T 1 | ' * 1 | L I NE » 

SWITCH-L7? 

an to 520; 

S'.JBJ CC T _HEAO ING=SUBjrTT_hFAQINO I 1 ' 
SWITCH«L8; 

GO TO S 201 . 



nsal fs = ns ales +i; 

SALE S_CODES (NS ALES) -L INE ? 
GO TO S20; . . 



MIL IN E ; 



ACC WDS 1-LINE 
GO TO S2C « 

ACCWDS?«L INF 



*/ 



EDIT AND PRINT DATA ELEMENTS 
CALL CniT(TITLE); 

PUT SKIP LIST! 'TITLE* »TITLE); 

SKIP LI ST( • SUBJ_H E AD ' , SURJ_HEAD) 
SKIP LTSTI 'ID.MC , ID_NO> ; 

SKIP LI ST( 'CITE^NO' ,CIT..E„NO)i 
SKIP L I S T ( •AUTHORS* „ AUTHORS) *, 

FO IT (C I TAT ION) ; 



PUT 

PUT 

PUT 

PUT 

CALL 



PUT SKIP LIST! •CITATION* ^CITATION) ; 

CAl L F 0 I T ( ABSTRACT) ; 

PUT SKIP L I ST C » ABSTRACT ABSTRACT ) ; 

PUT SKIP L T ST( • SUC5J EC T_PEAD ING« ,SIJBJ cCT_HEADING ) 
PUT SKIP LIST! 5 S A L F S_CCDES ' , SALE S_CO Dir S ) ; 

PUT SKIP LIST! * ACCESS WCRCS • » ACCWDS 1 ) ; 

PUT SKIP LI STM ACCESS WORDS’ , ACCWCS2 ) ; 

GO TO NEXTRFC; 



PR OC (LINE) » 

REDUCES MULTIPLE . BLANKS TC 



ONE BLANK */ 



o 

ERIC 
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TF ST : 



M AGP 0 


SOI C 7 C fr" 7 


LIST TNG 


1 4 - ° 




DC L LI \‘F C H AD ( » ) VA P ; 


1 5' 


p n '.I' 1 : 




1 51 




l=lvorv (LINT,* •); 


1 57 




. if- r-.sc thdm nn; 


1 53 




I F 1 =LFNGTP( LINEi-I THEM Of); 


1 5^ 




LTNF*SUrlSTR ( LIME, 1,1); 


1 55 




RETURN; 


1 56 




END; 


1 57 




L I N r -* SO 3 S TP ( LINE, 1 , I ) 1 | SIJBSTR (LINE, 1+2) ; 


15 8 




on to Enoi; 


1 59 




E'T>; 


160 




RFTIJRN; 


161 


END r 


HIT; 


1 6? 


F I N I S 


J 


16 3 


END 


TEST; 



NO ERROR OP WARNING CONDITION HAS BEEN DETECTED FOR THIS MACRO PASS* 
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I D_.NO 


F I X6 QY C 8 00 o I 


SECIJR ITY 




MICROFILM 


f.j 


SCO ID mo 


• ( j r o c,r c r o ci o of' c c o l * r 


TITLE 


calculating truf’ voltag 


SUB J_H FAD 


P AD I O CIRCUITS — PULSE 


I D_NO 


F I X69XCB 3001 


C ITE_NO 


69— 03 AO0- 32765 


AUTHORS 


0 AVER ID AJ 


C ITATION 


FLECTRO- TECHMOLOGV V 6 


ABSTRACT 


METHOD FOP. REDUCING TIM 


TF EQUATIONS 


; LOWERING OF VOLTAGE DUE TO SERIES 


P. DUPING OFF 


TIMES OF PULSE ARE CONSIDERED. 0176 



ON PULSE- CHARGED CAPACITC 



2 N 2 AUG 1968 P 73- 
E TO DESIGN CIRCUITS 
ELEMENTS IN CHARGING 



^ o 

HAVING 

CIRCUIT 



SUBJECT-HEADING 
SALE S_CO DES ... 



RADIO CIPCUITS» 
OG-.AloB 



PULSE 



ACCESS 

ACCESS 



WORDS 

WORDS 



O 

ERIC 
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SF- CHARGED CAP ACT TORS t \ 



A 1 76 3 



G 196.3 P 7 3- 
IGN C IP CU ITS 
IN CHARGING 



6 a 

HAVING CAPACITOR CJ 
CIRCUIT AND RIPPLE 



ARGFF) BY 
PRODUCED 



PULSE To AIN, USTNG SET Or 
BY DISCHARGING ACROSS LOAD 



C_Q— A 297 
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APPENDIX E 

READ ROUTINE FOR THE NSA SELECTOR FILE 
(G. Silva. 



I . IDENTIFICATION 
Title 

User Group 
Category 
Source Language 
Machine Configuration 

Space Requirements 

Author 

Date 



November 1969) 

TESTCA 

Institute of Library Research 

CIS Text Processing 

PL/1 

IBM 360/91, one input tape, 
printer 

Step region : 

PL1L : 162K 

GO: 150K 

Core Used: 80K 

G. Silva 

November 1969 
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READ ROUTINE FOR NSA SELECTOR FILE 



II. PURPOSE 

1. To read magnetic tape records of the Nuclear Science 
Abstracts Selector File. These records contain 
bibliographical information described under "Record 
Format" (see section IV) . 

2. To identify each field within a record. 

3. To provide a detailed printout for each record type 
(header information and variable part) . 
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READ ROUTINE TOR THE NSA SELECTOR FILE 

III. TAPE SPECIFICATIONS 

1 . 7 track tape 

2. 556 BPI 

3. Even parity 

4. No label 

5. Two fi3.es 

1st file: NSA Entry File Volume 23, Issue 14 

2nd file: NSA Selector File Volume 23, Issue 14 

6. Tape identification: ILR 070 

7 . Record format is U (undefined) 

8. Blocksize is 2044 bytes 

9. Character coding is BCDIC 



The DD card used to read the Selector File is : 

//GO.KEYWD DD DSNAME=NSA,TJNIT=TAPE7 ,VOL=SER=ILR070 , 

// DCB=(RECFM=U,BLKSIZE~2044,DEN=1,TRTCH=ET) ,LABEL= (2 ,NL , , IN) ,DISP=0LE 

Note. The subparameter IN (fourth subparameter of the LABEL 

parameter) , tells the system that this tape is to be used 
as input only and eliminates the operator/computer dialogue 
on write rings 
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READ ROUTINE FOR THE NSA SELECTOR FILE 



IV. FILE DESCRIPTION 



Keyword File 

Each keyword file will contain the selectors, i.e., 
keywords and various "additional terms”, assigned to items 
within an issue of NSA . 

The logical records, i.e., selectors and associated 
header for each item, will be ordered by abstract number, 
then by split*, then by type selector, then by alphabetic 
sorting sequence of selectors. Thus the file is "linear" 
with adjacent placement of selectors having split and type 
selector codes in common. 



Record Format 



A. Each Selector Record Header format, which is 

identical for all types of items, describes: 

1. Year of the NSA volume 

2. NSA issue number 

3 . NSA abstract number 

4. Serial number 

5. Overflow field--containing a 1 to indicate that 
the logical record overflows into the next 
physical record (lack of overflow is indicated 
by zero) 

6. Type of item 

7 . NSA Section Subsection Code (revised effective 
NSA , Vol . 21, Issue 1) 



*Split indicators are assigned to selectors which logically relate 
to each other, permitting search strategies which fail to 
coordinate selectors assigned to different splits. Thus, 
splits function as "link" indicators. 
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READ ROUTINE FOR THE NSA SELECTOR FILE 



IV. FILE DESCRIPTION 



Record Format (Continued) 

8 . Blank field 

9. Total character count for the item, or portion 
of the item in a single block (The portion of a 
logical record continued in a second block begins 
with a duplicate Selector Record Header containing 
the number of characters in the second block in 
the total character count field.) 

10. Terminal record mark 

B. Each Variable Field is terminated with a record 

mark. These variable -length selector fields 

identify : 

1. Split indication by means of a left adjusted 

letter code: A bink blnk, B blnk blnk, A A 

blnk , A B blnk, .... .AAA, AAB , ZZZ. 

2. Type of selector (1 -■ keyword, 2 = inorganic 
compound, 3 = isotope, 8 = additional term, 
and 9 = "provisional” additional term) 

3. Character count for the single selector field 

4. Selector in BCD notation 

5. Selector number, i.e. , unique numeric code for the 
selector 

If a selector appears in more than one split, a separate 
field is provided for each time it is used. 
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READ ROUTINE FOR THE NS A SELECTOR FILE 



V. CHECKOUT 



1 . 



Tlie program processed. 100 records with the following 
results : 



Step. Name Step CPU Seconds 



PL1L 

GO 

Load timing 
Processing time 
Average processing time 



2.29s 

4.62s 

.32s 

4.30s 

43 ms/log.rec. 



2 . Error Checking 
None 

3 . Cost 

The total cost of opening the tape tit the rate of 
$.12/MJS was $54.18. 



4. Remarks 



a. The NSA tape is a 7 -track tape probably written on 
the IBM 7094. Each block therefore consists of a 
multiple of 6 bytes. In order to achieve this some 
blocks are padded with blanks at the end. The PL/1 
program requires that the real end of the block be 
known, i.e., the end of the last selector field in 

the record. Terminal blanks, if present, are therefore 
removed before the record is processed. 

b. None of the 100 records processed were continued in a 
second block, i.e., no records were found with overflow=l. 
The longest record found had a character count of 1944. 



O 

ERIC 
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READ ROUTINE FOR THE NS A SELECTOR FILE 



V. CHECKOUT 

4. Remarks (Continued) 

c. According to information received from the Information 
Systems Department (July 14th) , any blocks of less than 
18 characters on tape should be considered ’’noise” 
records and ignored. At present the program does not 
check for ’’noise 5 ’ records, but it is advisable that 
such a check be built into it if this data base is 
used for production. 



VI. PROGRAM DESCRIPTION 



1. The logical flow of the program is fairly simple and a 
verbal description was therefore deemed redundant. The 
reader is referred to the attached flowchart and program 
listings. 

2. All variable names used in the program are self- 
explanatory . 
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READ ROUTINE FOR THE NSA KEYWORD FILE 

VI. PROGRAM DESCRIPTION 
3 . Flowchart: 





E-10 



//NRB409MT JOB • 1C* 1CCC , SI LVA 
// EXEC PL 1 

t SL'BSCRI PT RANGE) s (STP INGRANGE ) : 

KWDFILEs PROCEDURE OPTIONS (MAIN) ; 

/* NS A KEYWORD FILE */ 

/* SECOND FILE ON NS A REEL */ 

DCL 1 REC BASED(MAI N_P) • 

2 HEADER, 

3 YR_ ISSUE CHAR ( 2 ) , 

3 I S S U E_ N 0 CHAR (2 ) * 
j ABSTR_.NO CHAR ( 7) # 

3 S ER I AL_NO CHAR (6), 

3 OVERFLOW CHAR(l), 

3 I TE M_TYPE CHAR III , 

3 SPACE l CHAR ( 1) , 

3 S EC_SU ES EC_ CODE CHAR (4 )* 

3 BLANKS CHAR (7 ) , 

3 TOTAL_CHAR_CT CHAR ( A) , 

3 R ECORD_M ARK CHAR CD, 

2 VAR_F I ELCS CHARI2000); 

DCL 1 SUBREC BASEDCP), 

2 FIXEC_PART, 

3 SPLIT CHAR (3 ) » 

3 SELECTCR_TYPE CHAR(I), 

3 CHAR_CT CHAR ( 2), 

2 SELECTORS CHAR(ICO); 

DCL PROXY FIXED BIN(31) BASEO(PROXY PTR); 

DCL N FIXED BIN; 

DCL SELECTOR_NO CHAR ( 5 ) ; 

DCL TESTAREA CHAR *2004 ) VAR; 

DCL SELECT CHAR( 4fi) VAR; 

ON ENCFILE (KEYWD) GO TO FINIS; 

PRCXY_PT R=ACCR ( P ) ; 

K TR= C ; 

NEXTREC: 

TESTAREA=(2004 )• •; 

K TR=KTR+ I ; 

IF KTRMCC THEN GO TO FINIS; 

READ FILE (KEYWD ) INTO (TESTAREA); 

LTEST=LENGTH( TESTAREA); 

/* REMOVE BLANKS* IF ANY, FROM END OF RECORD */ 

KFl 5 

IF SUBSTR (TESTAREA* LTEST. !)-.= • • THEN GO TO KF5 ; 

TESTAREA=SUBSTR(TESTAREA.1,LTEST-1 ); 

LTEST=LTEST-l; 

GO TO KFl? 

KF 5 • 

MAIN_P=ADDR( TESTAREA); 

PUT PAGE; 

/* PRINT OUT MAIN HEADER INFORMATION */ 

PUT EDIT( ‘HEADER I NFORMA T I ON* ) ( SK I P * CCL(60), A); 

PUT ED IT ( • „ * ) ( SK IP ( 0 ) ♦ C OL ( 60 ) * A); 

PUT EDIT ( * YEAR OF NS A VOLUME', YR_ ISSUE ) ( SK IP ( 2 ) , COL( 40) 
A* COL 170 • A) ; 

PUT EDIT! *NSA ISSUE NO', ISSUE_NG) 

(SKIP, COL (40), A, COL (70), A); 

PUT EDITCNSA ABSTRACT NC , ABSTR_NO) 

(SKIP, COL ( 4C ) , A, C0L(7C), A); 

PUT EDIT ( 'SER IAL NO • , SER II A I NO ) 

(SKIP, COL (40 ) , A, COL (70 ) , A); 

PUT EDIT( 'OVERFLOW' . OVERFLOW) 
fcKjL (SKIP, COL ( 40 ) , A, COL ( 70 , A); 
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PUT EDIT I 'TYPE OF ITEM', IT EM_TYP E ) 

(SKIP, COLI AO I , A, C0LI7C), A); 

PUT EDIT! 'NSA SECTION* SUBSECTION', SEC_SUB SEC_C ODE > 

(SKIP, CO L ( 40 ) , A, COL (70 )« A); 

PUT EDIT! 'TOTAL CHAR COUNT', TOT AL~.CHAR_.CT ) 

(SKIP, COLI AC ) , A, C0L(7C>, A); 

P= A 0 OR (V AR_ F I EL DS ) ; 

KT=36; 

PUT EDI T( ' SUBHEADER I NFORMATICN* ) (SKIP (5 > , CGL(58), A); 

PUT EDIT! ' _ ______ 'MSKIP(O), C0LJ58), A) • 

PUT EDIT ( 'SPLIT' , 'SELECTOR TYPE', 'CHAR COUNT', 'SELECTOR', 
'SELECTOR NO' ) (SKIP (5) , CCL(IO), A, C0L(35), A, C0L(60), A, 
COL ( 85 ) , A, COLI 110), A); 

KF102 

i =i; 

n=char_ct; 

S ELECT=SUBSTR ( S EL ECTOR S, I,N ); 

I =I+N; 

SELECTOR_NO=SUBSTR( SELECTORS, I ,5) ; 

PUT EDIT ( SPLIT , SELEC TOR_TYPE , CHAR_CT, SELECT, SELECTOR_NO) 
(SKIP, COL ( 10 ) , A, COL (35 ) , A, C0LI60), A, COLI85), A, 

COLI 110) • A) ; 

KT=KT+N+12; 

1=1+11: 

IF KT=LTEST THEN DO; 

/* CHECK OVERFLOW */ 

IF OVERFLOW® ' 1 • THEN 

PUT SKIP LIST ('RECORD CONTINUED IN THE NEXT BLOCK.' ); 

GO TO NEXTREC; 

END: 

PROXY® PROXY ♦ I ; 

GO TC KFIO; 

FINIS: 

ENG kwdfile; 

/* 

//GO.KEYWD DD D SNARE =NSA , UNI T=TAPE7 , VCL=S£R= I LR070 , 

// DCB=(RECFM=U,BLKSIZE= 2044. DEN® 1*TRTCH=ET) , LABEL®! 2 ,NL, ,1 N) ,0ISP=0L0 
/* 
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APPENDIX F 



READ ROUTINE FOR THE 
CHEMICAL ABSTRACTS CONDENSATE FILE 
ILR060 



By 

Georgette Silva 



July 1969 



Institute of Library Research 
University of California 
Los Angeles, California 
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READ ROUTINE FOR THE CHEMICAL ABSTRACTS CONDENSATE FILE 
I. IDENTIFICATION 



Title 


TESTCA 


User Group 


Institute of Library Research 


Category 


CIS Text Processing 


Source Language 


PL/1 


Machine configuration 


IBM 360/91, one input tape., printer 


Space requirements 


Step region: 

PLIL lS8k 
Core Used 76k 
GO 150k 


Author 


G . S iiva 


Date 


July 1969 
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READ ROUTINE FOR THE CHEMICAL ABS TRA CTS CONDENSATE FILE 



II „ PURPOSE 

1. To read magnetic tape records of the Chemical Abstracts Condensate 
file. These records contain bibliographical information described 
under "Record Format" (see section IV.) . 

2. To identify each field within a record. 

3. To provide a detailed printout for each record type (header 
information and variable part) . 
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READ ROUTINE FOR THE CHEMICAL ABSTRACTS CONDENSATE FILE 



III. TAPE SPECIFICATIONS 



1 . 


9 track tape 






2. 


800 BPI 






3. 


Standard label 




4. 


One file 






5. 


Volume serial 


number is 000000 


6. 


Data set name 


is CAISSV 




7. 


Record format 


is variable 


blocked (VB) 


8. 


Logical record length is 


996 


9. 


Block size is 


1000 





The DD card used to read the tape is: 

//GO.CHEMAB DD DSNAME=CA.ISSV ,UNIT=TAPE9 ,VOLUME=SER=000000, 

// DCB=(RECFM=VB,BLKSIZE=1000,LRECL~996) ,DISP=OLD ,LABEL==(1,SL, 



Note. The subparameter IN (Fourth subparameter of the LABEL parameter), 
tells the system that this tape is to be used as input only and 
eliminates the operator/computer dialogue on write rings. 
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READ ROUTINE FOR THE CHEMICAL ABSTRACTS CONDENSATE FILE 
IV. File description 

The file contains variable length blocked records. The records are of 
five types, each type being designated within the record itself. 



Record format 



Records consist of fixed header information followed by a variable 
length field except for the type-1 record which is of fixed length 
throughout . 

Header information: 



Abstract number 


7 


bytes 


Blanks 


1 


byte 


Volume 


2 


bytes 


Number 


2 


bytes 


Blanks 


1 


byte 


Record type 


1 


byte 


Blanks 


2 


bytes 



Variable part : 

Data element : 
Title 
Author 

Where published 
Key words 



Record : 
Type-2 
Type-3 
Type-4 
Type- 5 



The second part of the type-1 record is of fixed length (24 bytes) 
and consists of the following elements: 



Journal CODEN 
Volume 
Number 
Year 

Starting pg. number 
Ending pg. number 



6 bytes 
4 bytes 
4 bytes 
2 bytes 
4 bytes 
4 bytes 



Note that each abstract may have several type-5 records (Keywords) 
associated with it. 
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READ ROUTINE FOR THE CHEMICAL ABSTRACTS CONDENSATE FILE 



V. CHECKOUT 
1 . Timing 

The program processed 100 
Strep Name 
PLIL 
GO 



logical records with the Following results 
Step CPU Seconds 
2.54s 
0.68s 



Load timing 0.?.3s 

Processing time 0.45s 

i.e. 4. 5msec s/logical record 

The following is an approximation to the processing time 
required for a full bibliographical unit (a sequence of record 
type 1,4,2,3 9 4 followed by n type-5 records): 

t=2 2.5 + 4.5 x n 

where t is in milliseconds. 

? . Error checking 

a. Rec-typn field. 

No "illegal" record types were detected. 

b . A check was made for records consisting of header information 
followed by a blank variable part. Some were found. 




The tape was opened in 10 runs which took a total of 28.07cpu 
seconds at a total cost of $10.88. 




READ ROUTINE FOR THE CHEMICAL ABSTRACTS CONDENSATE FILE 



VI. PROGRAM DESCRIPTION 
1, Preprocessor variables : 

REC structure name for header information CACOND 

REC_SIZE length of variable part of record 300 bytes 



FLAGNO 


number of different record types 5 


FLAGKAR 


length in bytes of field containing 

record type 1 



2. Processor variables : (in order of occurrence in deck) 



FLAG 


contains the possible record types. 


LB 


is a label array containing the labels leading to the 
specific operations required by each record type. 


CACOND 


structure name containing header information 


REC1 


structure containing second part of record type-1. 


DUMMY 


contains one logical record 


NREC 


counts the number of records processed . 



The other variables are self-explanatory. 



0 

ERLC 



READ ROUTINE FOR THE CHEMICAL ABSTRACTS CONDENSATE FILE 



S-i C 




O 




1 


CD O S 




S-i 




cd 1 


TJ *H § 




Qj 




-M \ 


ttf 4-> p 




Oj CD rH 




cd 4-» \ 


QJ cd Q 




Cd -M CD 




T3 C \ 


£= 




cd n 




<u V 


S-i S 




o -h Cd 




-M i N 


CL, O O 




4-1 S-I rH 




C CL) 


O 4-1 Cl 




Oj 




•H rH 


S-i C 4-1 




O 




S-i CD 


Q *H 




CD 




CU 



c n 

CL) 

> 



» j 








CL) CM 








T3 O 1 


QJ 




\o« 


cd *h \ 


C T3 






CL) H V 


•H Cl CD 




o 


JG cd \ 

P V r- 


6 O Qj 
54 O >% 


/ *"0 
r-Y S-i 


cd b\ c: 

o. nj y 


-M 54 \ 


CD CD 4-> 


\ O 




CO \ 


+J S-i 


X O Eh x 


•H 4-1 I 


CD 


Xaj 




s. c 1 


O 






H / 









-M r a 
C U 
*H O 

u a 

CU CL) 



-M 

S-i 

cd 

o 

S 



o 

rH 



m 






MEOlOOl 



4. Program Listing 



// EXEC PL 1 . P AR M= • M . SM= ( 2 *72 *1 ) » S T ' . PG=200 
//PL1L.SYSIN DD * 

TESTCAs 

PROCEDURE OPT I DNS IMA IN ) ; 

% DC L REC CHAR? 

X REC = • C AC GND * ; 

X DC L R E C_ SIZE FIXEC; 

X REC_SI ZE =3 0 0; 

X DCL FLAGNO FIXED; 

X FLAGN0=5; 

X DCL FLAGKAR FIXED; 

% FLAGKAR®!; 

DCL FLAG CHAR(FLAGNO) IN I TI • 12345 • > ; 

DCL LR(FLAGNC) LABEL I N I T ( L I » L 2 , L 3 . L 4 , L 5 ) ; 

DCL 1 REC BASED ( MAI N_P) . 

2 F 1 X£C)_P A RT * 

3 ABSTR„NC CHAR(7>. 

3 SPACE1 CHAR ( I ) * 

3 VOL CHAP ( 2) » 

3 NO CHAR ( 2 ) i 
3 SP ACE2 CHAR ( 1 ) • 

3 I‘EC_TYPE CHAR ( FLAGKAR ) . 

3 SPACE3 CHAR ( 2 ) ; 

DCL 1 REC 1 BASED (MAIN_P )« 

2 J_CODEN CHAR ( 6) » 

2 VOLUME CHAR ( 4 ) • 

2 NUMBER CHAR (4 ) . 

2 YEAR CHAR (2 ) * 

2 S TPGNO CHAR ( 4 ) . 

2 ENCPGNO CHAR ( 4 ) ; 

DCL DUMMY CHAR(REC„SIZE)VAR; 

ON ENDFI LE (CHEMAB ) GO TO FINIS; 

NREC=0; 

PUT PAGE; 

MAI N_P=ADDR (DUMMY) ; 

NEXTREC: 

DUMMY® (RE C_S I ZE ) • • ; 

READ FILE (CHEMAB) INTO (DUMMY); 

NREC, = NREC+ 1 ; 

IF NREOIOO THEN GO TO FINIS; 

PUT SKIP 13) L I ST (• ABSTRACT NUMBER'. ABST R_NO ) ; 
PUT SKIP L I S T (• VOLUME ' * VOL ); 

PUT SKIP LISTC'NO', NO); 

PUT SKIP LIST (' RE C_ TYPE' * R EC__TYP E ) ; 

I = I NDE X (F LAG ,REC_ TYPE ) ; 

/* PROTECTION AGAINST ILLEGAL REC CRD TYPES */ 

If 1=0 THEN DC; 

PUT SKIP LIST ( • ILLEGAL R EC_TY PE ' . REC_TYPE); 

PUT SKIP LISTIDUMMY); 

GO TO NEXTREC; 

E NO; 

/* PROTECTION AGAINST BLANK RECORDS */ 

IF LENGTH (DUMMY )< 17 THEN tiO ; 

PUT SKIP LIST ( • SUS PECT R ECOR D • , • NR EC • . NR EC , DUMM Y ) 
NREC =NREC+1 ; 

GO TO NEXTREC? 

ENC; 

DUMM Y = SO B srmt&wmy .17); 

GO TO LBt I); 

TYPE-1 RECORCS DO: */ 

PUT SKIP LI ST C • J_CODEN' * J_CODEN) ; 

100 



FOR 
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Program Listing fConti niied) 

PUT SKIP LIST ( 'VOLUME*, VOLUME ) ; 

PUT SKIP L 1ST (* NUMBER* * NUMBER ) ; 

PUT SKIP LI ST ( 'YEAR* , YEAR! ? 

PUT SKIP L 1ST C *STARTI NG PG. NO. • , STPGNO) ; 
PUT SKIP L 1ST { * END ING PAGE NO » , ENDPGNO ) ; 
GO TO NEXTREC; 

/* FOR TYPE-2 RECORDS DO: */ 

LZs 

PUT SKIP LIST (• ART ICLE T ITL E *, DUMMY )-; 

GO TC NEXTREC; 

/* FOR TYPE-3 RECORDS DO: */ 

L3 : 

PUT SKIP LI ST ( • AUTHCRS* , DUMMY I ; 

GO TO NEXTREC; 

/* FOR TYPE -4- RECORDS DO: */ 

L4 ; 

PUT SKIP LISTC • WHERE PUBLISHED*, DUMMY); 
GO TC NEXTREC; 

/* FCR TYPE-5 RECORDS DC: */ 

L 5 ? 

PUT SKIP LIST! * INDEX TERMS*, DUMMY) 5 

GO TC NEXTREC; 



FINIS: 

END TESTCA; 

fGO.CHEMAB DD D SN AME =C Al S SV , UNI V=TAPE 9 , VC LUME=SE R=000000 , 
f DCB- CRECFM=VG, 8LKS IZE= l COO, LRECL* 996 ) , D I SP=OLD , LABE L= ( 1 , S L , ,1 N) 
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APPENDIX G 

INVENTORY OF PROGRAMS AND SUBROUTINES 



1 PROGRAM: 
AUTHOR: 
STATUS : 
DESCRIPTION: 



2 . PROGRAM: 
AUTHOR: 
STATUS : 
DESCRIPTION: 



STEVE SILVER 
IN DEVELOPMENT 

INTX is a system for assembling, loading, and 
interactively interpreting 360 assembly language 
source programs under the UCLA URSA time- sharing 
system. It permits URSA customers to produce 
completely interactive programs without any risk 
of system damage--a feature that is not currently 
guaranteed by true URSA processors. The system 
is fully operational now but minor tuning and 
development will continue. 

FMS 

STEVE SILVER 
COMPLETE 

FMS is designed for high speed, cheap generation 
of natural language documents for mixed text and 
command input. It is very useful for producing 
final copies of technical reports requiring 
extensive minor alternation before publication. 
The system needs some work before the current 
version can be released as safe. 



3 . PROGRAM: 
AUTHOR: 



DISCUS 

STEVE SILVER 



STATUS : IN DEVELOPMENT 

DESCRIPTION: DISCUS is a computer aided instruction system 

oriented towards CRT graphic terminals . It is a 
two part system: a compiler that runs on any 360 

and an executer whose I/O package must be tailored 
to the individual time-sharing system. It is 
currently under development but is expected to be 
completed very shortly. 



4. PROGRAM: FAMULUS 



AUTHOR: JERRY PINE 



STATUS : COMPLETE 



DESCRIPTION : 




FAMULUS is designed to process personal reference 
collections maintained by researchers. However, 
its basic structure renders it suitable for a 
large number of other applications. For this 
purpose it can be regarded as a general-purpose 
text-handling system. 
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FAMULUS will maintain many types of information files 
which can be broken into units or records with sub- 
categories or fields which can be identified. In a 
personnel information file, the data on each person 
comprises one record. The record may have up to 10 
distinct fields in which are entered name, date of 
birth, job title, etc. In bibliographic files, the 
citation is the record, and fields are used for author, 
title, date, keywords, abstracts, etc. 



S. PROGRAM: WORDLST 

AUTHOR: LINDA MIROFF 

STATUS : COMPLETE 

DESCRIPTION: WORDLST is a PL/I subroutine which breaks a given 

character string into individual words. The user 
specifies characters which are to act as word delimiters 
these characters should not be embedded within words. 
WORDLST stores the individual words without any 
delimiters in an array. A number giving the relative 
position of the word in the string is stored in a 
parallel array. In this way other routines have 
access to the words as they appear within the string. 

A list of words, called an exclusion list, may be 
used to exclude non-content words. Words less than 
a given length may also be excluded. 



6. PROGRAM: 
AUTHOR: 



CONTEXT 
LINDA MIROFF 




STATUS : COMPLETE 

DESCRIPTION: CONTEXT is a PL/I subroutine which does most of the 

"work" involved in building a KWIC index. It finds 
the contexts of given words in a given string. It 
can be used in conjunction with an input routine, 
the WORDLST routine, a sort routine, and an input 
routine, to produce a KWIC index. 

The context of a word consists of as much of the 
surrounding sentence as can be fitted in the space 
allowed. The number of characters in the context line 
is set by the user. When all of the sentence cannot 
be fitted in, parts of the sentence furthest from the 
word are truncated. The word always appears in the 
middle of the line. Wrap-around (from left to right 
or from right to left) is performed to make use of 
extra room on either side of the word. When there is 
a space a indicates the beginning of the context; 

a n +” is used at the end of the context to indicate 
that part of the sentence is missing. 
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7 . PROGRAM: 
AUTHOR: 



CONTITL 
LINDA MIROFF 



STATUS: COMPLETE 



DESCRIPTION: CONTITL is a special version of CONTEXT, to be used 

where the text always consists of titles or 
single sentences- CONTITL produces ICWIC records 
in the same manner as CONTEXT but does not recog- 
nize sentence endings or beginnings. The output 
produced is like that produced by CONTEXT. 

8 . PROGRAM: DEM01 



AUTHOR: 

STATUS: 

DESCRIPTION: 



9. PROGRAM: 



GEORGETTE SILVA 
COMPLETE 

DEM01 reads text from magnetic tape, breaks it 
into individual words, sorts them alphabetically 
and in order of decreasing frequency, and lastly, 
prints each word surrounded by context in order 
of occurrence in the text. All these tasks are 
carried out in internal memory. The program 
performs these tasks on a fairly methodological 
level. The sort, for example, is simply what is 
termed a "bubble sort". The text must be limited 
to 10,000 characters. The latter condition is 
not serious, however, since this is more than 
adequate to cope with short texts such as abstracts. 
The main virtue of this program is that it 
consists of one deck, and carries out all the 
tasks in one run. It may, therefore, be useful 
as a demonstration program for beginners. 

TEXTMT 



AUTHOR: GEORGETTE SILVA 

STATUS: COMPLETE 

DESCRIPTION: TEXTMT is somewhat more sophisticated than DEM01 

and has a few additional capabilities: it does 

character conversion by table lookup, retrieves 
index terms in context, and uses a binary tree 
sort for the dictionary and frequency sort. 

These operations are carried out in internal memory, 
and the amount of text processed at any given time 
is still limited. 



O 
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10. PROGRAM: 
AUTHOR: 
STATUS : 
DESCRIPTION: 



11 . PROGRAM: 
AUTHOR: 
STATUS : 



INDX 

GEORGETTE SILVA 
COMPLETE 

The INDEX program is designed to cope with texts 
in prose, poetry, and dramatic forms, as well as 
with transcriptions of oral discourse. The 
program reads input text from cards and produces 
records each containing one textual word or 
punctuation mark, and index information showing 
where the word/punctuation mark occurred in the 
text. Each word/punctuation mark is indexed as 
to Volume, chapter, paragraph, sentence, and 
word within sentence, and word within sentence 
location. These records are stored in a temporary 
file which forms the input to program GAMMA. 

GAMMA 

GEORGETTE SILVA 
COMPLETE 



DESCRIPTION: 



12. PROGRAM: 
AUTHOR: 
STATUS : 
DESCRIPTION: 



13 . PROGRAM: 
AUTHOR: 



Program GAMMA produces a word index and word 
frequency list from the output of INDX. The 
word index is in alphabetical order by word and 
includes frequency count. The word frequency 
list is ordered alphabetically within descending 
frequency. GAMMA invokes the IBM sort/merge to 
do the necessary sorting and prints the final 
output. 

CONCORD 

GEORGETTE SILVA 
COMPLETE 

CONCORD reads input text from cards in stream 
form and produces concordance records (in KWIC 
format) on each word in the text which is not 
in an exclusion list. The records are written 
in a sequential data set which forms the input 
to program ALPHA. 

ALPHA 

GEORGETTE SILVA 



STATUS : COMPLETE 




DESCRIPTION: ALPHA sorts the output records produced by CONCORD 

into ascending sequence by word and identification. 
It invokes the IBM sort/mergs and prints the / 
concordance, underlining the keyword of each / 
line of text. / 
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1M-. PROGRAM: 

AUTHOR: 
STATUS : 
DESCRIPTION: 



15. PROGRAM: 



ASPEN (California Code and State Constitution) 
Read Routine 

GEORGETTE SILVA 

COMPLETE 

This routine reads magnetic tape records of the 
Aspen file (containing full-text images of 298 
documents from the California Code and State 
Constitution) , identifies each field within a 
record, and provides a detailed printout for each 
record type. 

COMPENDEX Read Routine 



AUTHOR: GEORGETTE SILVA 



STATUS : COMPLETE 



DESCRIPTION: This routine reads magnetic tape records of the 

Computerized Engineering Index file (containing 
bibliographic information) , identifies each field 
within a record, and provides a detailed printout 
for each record type. 

16. PROGRAM: CHEMICAL ABSTRACTS CONDENSATE Read Routine 



AUTHOR: 
STATUS : 
DESCRIPTION: 



17. PROGRAM: 

AUTHOR: 
STATUS : 
DESCRIPTION: 



18. PROGRAM: 
AUTHOR: 



CEORGETTE SILVA 
COMPLETE 

This routine reads magnetic tape records of the 
Chemical Abstracts Condensate file (containing 
bibliographic information) , identifies each field 
within a record, and provides a detailed printout 
for each record type. 

COMMUNICATIONS OF BEHAVIORAL BIOLOGY (CBB) 

Read Routine 

GEORGETTE SILVA 

COMPLETE 

This routine reads magnetic tape records of the 
Communications of Behavioral Biology file (containing 
bibliographic information) , identifies each field 
within a record, and provides a detailed printout 
for each record type. 

NUCLEAR SCIENCE ABSTRACTS (NSA) ENTRY Read Routine 
GEORGETTE SILVA 



STATUS : COMPLETE 




DESCRIPTION: This routine reads magnetic tape records of the 

Nuclear Science Abstracts Entry file (containing 
bibliographic information) , identifies each field 
within a record, and provides a detailed printout 
for each record type. 
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19. PROGRAM: 

AUTHOR: 
STATUS : 
DESCRIPTION: 



NUCLEAR SCIENCE ABSTRACTS (NSA) SELECTOR 
Read Routine 

GEORGETTE SILVA 

COMPLETE 

This routine reads magnetic tape records of the 
Nuclear Science Abstracts Selector file (containing 
condensed bibliographic information) , identifies 
each field within a record, and provides a detailed 
printout for each record type. 



20. PROGRAM: 
AUTHOR: 
STATUS : 



BINCDF 
STUART BEAL 
COMPLETE 



DESCRIPTION: A FORTRAN function for computing the binomial 

cumulative distribution. 



21. PROGRAM: 
AUTHOR: 
STATUS : 
DESCRIPTION: 



PARAMETER P TEST 
STUART BEAL 
COMPLETE 

A FORTRAN program which tests the P parameter of 
the binomial distribution. 



22. PROGRAM: 
AUTHOR: 



EIRC 

AEINT DE BOER 



STATUS : COMPLETE 

DESCRIPTION: The ERIC file search system will search, maintain and 

provide listings from the ERIC file. 

a) MULFSCH — This is a batch program to search an 
inverted index to an ERIC file or a sequential 
ERIC file by a Boolean combination of descriptors 
or a list of accession numbers. It is designed 
to be extended to search other files. 

b) ERICMIF — This is a batch program to create an 
inverted index to an ERIC file. It reads a 
sequential ERIC file, extracts the descriptors 
and accession numbe:?, and writes a direct access 
file of descriptors with corresponding accession 
numbers. This file can be used by MULFSCH, 
ERICIFL, and ERICDFL. 




i 

i 



108 



G-7 



c) ERICIFL — This is a batch program to list an inverted 
index to an ERIC file. The listing includes 
descriptors and frequency of use and can be either 
one up or two up. The one up listing can include 
the corresponding accession numbers as an option. 

The listing is ascending by accession number 
within descriptor. / 

d) ERICDFL — This is a batch program to list an inverted 
index to an ERIC file. The listing includes 
descriptors and frequency of use and is ordered by 
descending frequency of use. 



23. PROGRAM: 
AUTHOR: 



CENS60V 
AEINT DE BOER 



STATUS : COMPLETE 

DESCRIPTION: CENS60V is a batch program to verify a set of documented 

universes for the 1960 ;/1000 sample census tape. 



24. PROGRAM: 
AUTHOR: 
STATUS : 
DESCRIPTION: 



UTILITY PROGRAMS 
AEINT DE BOER 
COMPLETE 

a) STIMER (a modification of a routine borrowed y 
and from Stu Beal) is a short assembler language 
subroutine to enable a PL/I programmer to measure 
elapsed task CPU time and elapsed real time. 

b) PGMPRNT is a batch program to print programs in 
a format suitable for inclusion in a thesis. 
Includes provisions for a figure title, figure 
number, number of parts, statement numbers and 
page numbers. 
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